AD-A273  823 


AFIT/GAM/ENG/93D-1 


Binaural  Room  Siniulation 


DTIC 


ri.ECTE 

DEC161993 


THESIS 

Brian  A.  Smith 
ILT,  USAF 


AFIT/GAM/ENC/93D-1 


93-30499 


Approved  for  public  jelease;  distribution  unlimited 


AFIT/GAM/ENG/93D-1 


Binaural  Room  Simulation 

THESIS 

Presented  to  the  Faculty  of  the  School  of  Engiiieeriug 
of  the  Air  Force  Institute  of  Technology 
Air  finiversity 
In  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of 
Master  of  Science  (Mathematical  Sta  sties) 

Brian  A.  Smith,  B.S.  Chemistry 
ILT,  I'SAF 

December,  1993 


Approved  for  public  release;  distribution  unlimited 


A  cknowledgements 


In  most  thesis  papers,  the  acknowledgements  section  is  typically  one  of  the 
shorter  sections  in  tlie  document.  In  my  case,  1  would  have  no  trouble  making  it  the 
longest  section:  but  in  staying  with  tradition,  I  will  keep  this  as  short  as  possible. 
If  not  for  the  help  of  dozens  of  people,  t  his  work  would  have  never  been  completed. 
In  thanking  the  many  people  who  helped  me,  1  will  start  at  the  very  top.  No  one 
deserves  more  credit  for  this  work  than  the  (!ood  Lord  Jesus  Christ.  It  is  only  by 
the  grace  of  God  that  I  had  this  opportunity.  In  His  great  wisdom  He  provided  me 
with  an  ideal  research  topic  and  support  group. 

My  thesis  committee,  composed  of  Dr.  .Mark  Oxley,  Barbara  McQuiston,  and 
Dr.  Steve  Rogers,  as  well  as  employees  of  .Armstrong  .Aerospace  Medical  Research 
Laboratories-Marc  Ericson,  Lt.  Bill  D’.Angelo,  and  Richard  McKinley-were  the 
source  of  ideas,  and  technical  expertise  for  the  research  in  this  thesis.  A  number 
of  Electrical  Engineering  students  were  very  helpful  in  teaching  me  better  ways  to 
use  the  Sun  Workstations  in  the  Signal  Processing  Lab.  I  would  like  to  thank  John, 
Neil,  Curtis,  Gary,  Bob,  Kim,  and  Bill  for  all  their  help.  Also,  Capt  John  Colombi 
was  extremely  helpful  with  answering  my  cpiestions  regarding  the  use  of  ESPS.  Also, 
thank  you  to  Janet  Slilba  for  her  help  in  developing  the  filters  for  this  rc^search.  If  not 
for  the  contributions  of  each  o(  these  individuals,  my  work  load  would  have  been  even 
heavier.  Finally,  I  would  like  to  thank  my  fiancee.  Karen,  for  her  support  throughout 
this  research  effort.  Her  willingness  to  spend  her  weekends  at  AFIT  made  the  task 
of  completing  this  work  possible. 


Brian  A.  Smith 


Table  of  Contents 


Page 


Acknowledgements .  ii 

List  of  Figures  .  vii 

List  of  fables .  viii 

Abstract .  ix 

I.  Introduction .  1 

1.1  Background .  1 

1.2  Definitions .  3 

1.3  Problem .  4 

1.4  Research  Objectives .  4 

1.5  Scope .  4 

1.6  Approach .  5 

1.6.1  Mathematical  Approach .  5 

1.6.2  (Mmputational  Approach .  6 

1.7  Thesis  Outline .  6 

II.  Literature  Review .  7 

2.1  Introduction .  7 

2.2  Receiver  Cues  in  Auditory  Localization .  7 

2.2.1  Introduction  Receiver  Cues .  7 

2.2.2  Interaural  Time  Difference .  8 

2.2.3  Interaural  Intensity  Difference .  9 

2.2.4  The  Effects  of  the  Pinnae  .  10 


iii 


Page 

2.2.5  Head  Motion  .  Ki 

2.2.6  ('onclusion  Receiver  Cues  .  13 

2.3  Room  .Acoustics  .  13 

2.3.1  Introduction  Room  Acoustics .  13 

2.3.2  Reflections .  11 

2.3.3  Reverberation .  16 

2.3.4  Diffuse  Reflections .  17 

2.3.5  Attenuation  and  Phase  Change  Due  to  Reflections  18 

2.3.6  Attenuation  Due  to  the  Medium .  18 

2.3.7  (conclusion  Room  Acoustics .  19 

2.4  Sound  Source  Cues .  19 

2.4.1  Directivity  of  the  Sound  Source .  19 

2.4.2  Doppler .  19 

2.5  Conclusion  .  20 

III.  Mathematical  Analysis .  21 

3.1  Introduction .  21 

3.2  Modelling  the  Delay  Due  to  Distance  Traveled .  21 

3.3  Determining  the  Angle  of  Incidence .  26 

3.4  Head  Related  Transfer  Function .  28 

3.5  Headphone  Transfer  Function .  29 

3.6  Room  Transfer  Function .  30 

3.6.1  Modelling  Reflections .  30 

3.6.2  Modelling  .Attenuation .  31 

3.7  Overall  Transfer  Function .  33 

3.8  C'onclusion  .  34 


IV 


Page 


IV.  Methodology  .  do 

4.1  IntiodiK'tion .  35 

4.2  Det<'niuning  the  Angle  of  liicidenee  and  Distance  to  the  Lis- 

t«‘ner .  30 

4.2.1  Room  Dimensions .  36 

4.2.2  Listener  Location  and  Orientation .  36 

4.2.3  Sound  Source  Locations  .  36 

4.2.4  MathCad  Template  Output  .  37 

4.3  Determining  the  Angles  to  Use  for  F'ilter  Design .  38 

4.4  Designing  the  Filters .  38 

4.5  Recording  and  Filtering  a  Monophonic  Signal .  39 

4.6  Accounting  for  ITD  and  Time  Delay  Due  to  Distance  Traveled  40 

4.6.1  ITD  .  40 

4.6.2  Distance  Tinn*  Delay .  40 

4.6.3  Adding  the  Delayed  Sounds  to  Simulate  Reflections  10 

4.7  Accounting  for  Attenuation  Due  to  Distance  Traveled  ...  41 

4.8  Creating  a  Binaural  Signal .  41 

4.9  Increasing  the  Head  Size .  42 

4.10  Conclusion  .  42 

V.  Informal  Subjective  Impressions .  43 

5.1  Introduction . 43 

5.2  Results  of  the  Initial  Investigation  .  43 

5.2.1  (Comparisons  for  Addition  of  Reflections  without 

Attenuation .  44 

5.2.2  Comparisons  with  Attenuated  Reflections .  45 

5.2.3  Comparisons  Between  .Attenuated  and  Unattenu¬ 
ated  Reflections .  46 


V 


Page 

5.3  Double- Blind  E.xperiment .  16 

5. .3.1  Purpose  and  Scojje .  16 

5.3.2  Results .  47 

5.1  Increasing  the  Virtual  Head  Size  .  50 

5.5  Suininary  of  Results .  51 

VI.  Conclusions  and  Recommendations .  52 

6.1  Summary .  52 

6.2  Conclusions .  53 

6.3  Recommendations  for  Future  Research .  54 

Appendix  A.  MathCad  Output .  55 

Appendix  B.  Speaker  Locations  for  HRTF  Measurements .  68 

Appendix  C.  Tables  Comparing  the  HRTF  Angles  to  the  MathCad  Output  72 

Appendix  D.  ESPS  Manual  Pages  and  C  Code .  76 

D.l  Addsd .  76 

D.2  Delay  .  80 

D.3  Filter  .  84 

D.4  Mux .  90 

D.5  S32cplay .  97 

D.6  S32crecord  .  102 

Appendix  E.  Plots  of  Sample  HRTF  Filters .  108 

Bibliography  .  113 

V'^ita .  116 


VI 


List  of  Figures 

Figure  Page 

1.  Orientation  for  Azimuth  Data .  12 

2.  Orientation  for  Elevation  Data .  12 

3.  Calculating  a  Virtual  Source  Using  1  he  Image  Source  Method  (Adapted 

from  Borish) .  16 

4.  Two-Dimensional  Vhew  of  Virtual  Sources  Obtained  Using  the  Image 

Source  Metliod  in  a  Rectangular  Room  (Adapted  from  Borish)  ...  17 

5.  .An  Example  of  a  Diffuse  Reflection  Resulting  from  Sound  Contacting 

a  Reflective  Surface  (.Adapted  from  Blauert)  .  17 

6.  Orientation  of  the  Rectangular  Room  Used  in  the  Simulation  ....  22 

7.  Coordinate  System  (ci,e2,C3)  Defining  the  Position  and  Orientation 

of  the  Listener’s  Head .  24 

8.  Orientation  for  Azimuth  in  the  Mathematical  Model .  27 

9.  Orientation  of  the  V'irtual  Rooms  Around  the  Room  C^ontaining  the 

Sound  Source  .  32 

10.  Filter  Designed  for  the  Left  Ear  at  .Approximately  180  degrees  Azimuth  109 

11.  Filter  Designed  for  the  Right  Ear  at  .Approximately  180  degrees  Az¬ 
imuth  .  109 

12.  Filter  Designed  for  the  Left  Ear  at  Approximately  278  degrees  Azimuth  1 10 

13.  Filter  Designed  for  the  Right  E)ar  at  Approximately  278  degrees  .Az¬ 
imuth  .  110 

14.  Ehlter  Designed  for  the  Left  Ear  at  Approximately  350  degrees  Azimuth  111 

15.  Filter  Designed  for  the  Right  Ear  at  Approximately  350  degrees  Az¬ 
imuth  .  Ill 

16.  Filter  Designed  for  the  Left  Ejar  at  .Approximately  82  degrees  .Azimuth  112 

17.  Filter  Designed  for  the  Right  Ear  at  Approximately  82  degrees  Azimuth  112 


List  of  Tables 

Table  I'age 

1.  Location  of  the  Sounds  used  in  the  DoiibU'-liliiul  F.xperinieiit  ....  18 

2.  Responses  For  (lie  l)oid)le-Hlind  H.xperinient  .  19 

3.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Sim¬ 
ulation  for  a  Source  at  -10  Degrees  .Azimuth  .  T.i 

4.  Comparison  of  .Mathcad  Output  to  Speaker  Location  used  in  the  Sim- 

idation  for  a  Source  at  180  Degrees  Azimuth .  73 

5.  Comparison  of  Mathcad  Outj)ut  to  Speaker  Location  used  in  the  Sim¬ 
ulation  for  a  Source  at  20  Degnx's  Azimuth  50  D('grees  Klevation  .  .  73 

6.  C'omparison  of  Mathcad  Output  to  Speaker  Location  uscxl  in  the  Sim¬ 
ulation  for  a  Source  at  120  Degrees  Azimuth .  71 

7.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Sim¬ 
ulation  for  a  Source  at  -120  Degrees  .Azimuth .  71 

8.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Sim¬ 
ulation  for  a  Source  at  -75  Degrees  Azimuth  .  71 

9.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Sim¬ 
ulation  for  a  Source  at  75  Degrees  .Azimuth .  75 


viii 


AFiT/C  A  M  /  E.\G/93D- 1 


Ahstract 

Research  in  binaural  and  spatial  liearing  is  ul  particular  interest  to  tin'  Air 
Force.  Applications  in  cockpit  communication,  target  recognition,  and  aircraft  nav¬ 
igation  are  being  explored.  This  thesis  examines  human  auditory  localization  ciu's 
and  develops  a  mathematical  model  for  the  transfer  function  of  a  sound  signal  trav¬ 
eling  from  an  isotropi.  point  source  through  a  rectangular  room  to  both  ears  of  a 
listener.  Using  this  model  as  a  guide,  non-head  coupled  binaural  sound  signals  are 
generated  in  a  binaural  room  simulation.  Reflection  and  attenuation  cik's  included  in 
the  computer  generated  signals  are  varied  in  order  to  determine  which  cues  enhance 
the  listener’s  degree  of  extracranialization.  Results  of  this  research  indicate  that  the 
addition  of  three  or  more  attenuated  reflections  into  a  non-head  coupled  binaural 
signal  provide  the  listener  with  a  binaural  sound  that  is  localized  extracranially. 


IX 


Binaural  Room  Simulation 


I.  Introduction 

l.l  Background 

Human  auditory  localization  is  the  ability  of  a  human  listener  to  acoustically 
identify  the  azimuth,  elevation,  au<l  range  of  a  sound  source*  ( 19: 1 ).  A  hunian's  ability 
to  locate  a  sound  stems  primarily  from  the  fact  that  humans  hear  with  two  ears.  In 
a  process  called  binaural  fusion,  tin*  brain  extracts  information  from  the  sound  at 
each  eardrum  to  aid  in  determining  the  exact  location  of  a  sound  source*  (lotOO).  In 
the  ideal  reproduction,  a  listener  h<*ars  the  signal  played  through  head])hoiK’s  and 
perceives  the  sound  as  eminating  from  a  three-dimensional  i)oint  in  space  (23:171- 
172). 

The  development  of  a  system  that  enables  humans  to  accurat(*ly  localize  sound 
in  three-dimensional  space  is  of  particular  interest  to  the  .Air  Force  (19)  (32).  'J’he 
irnplejnentation  of  an  auditory  localization  system  into  future  .Air  Force  cockpits 
would  provide  pilots  a  new  source  of  spatial  information  that  might  assist  them  in 
performing  complicated  tasks  intuitively.  For  example,  the  otherwi.se  tedious  task  of 
keeping  track  of  the  location  of  a  wingman  could  be  accomplished  simply  through 
auditory  localization  of  radio  transmissions.  Chew  aircraft  would  also  benefit  I'roni 
the  use  ot  this  technology.  .A  |)ilot,  co-pilot,  navigator,  and  bombadier  could  all 
identify  who  is  speaking  through  spatial  .sound  localization.  Other  applications  (*xist 
in  the  areas  of  target  recognition,  and  aircraft  navigation  (20)  (38). 

The  foundation  for  an  accurate  auditory  cue  synthesiz(*r  has  been  under  con¬ 
struction  for  more  than  sixty  years  (23:172).  During  this  time,  theory  and  imple¬ 
mentation  of  the  theory  ha\’e  evolved  continuously.  When  a  sound  is  generat(*d  and 


1 


travels  to  a  listener,  the  acoustic  signal  li.at  is  received  at  the  listener's  (>ardrums 
is  significantly  different  from  the  acoustic  signal  that  hd’t  the  soiiice.  1  hes(‘  changes 
result  from  the  acoustic  signal's  interaction  with  the  acoustic  environment,  and  from 
the  design  of  the  human  head  and  ears,  (d'i)  (If))  (IS)  (2d). 

Past  research  indicates  that  the  (h'sign  of  tin*  human  head  acts  to  transfer  tin* 
original  sound  from  a  given  din'Ction  and  distance  into  two  signals  that  arc*  intm- 
preted  by  the  l)rain  (dd).  If  the  tran.si«'r  funciions  to  eac  h  ear  can  be  found,  they  can 
then  be  modeled  mathematically  and  simulated  elect ronicallv  as  filters  (2d:208).  If 
the  transfer  functions  lor  all  locations  vv(M('  known,  it  would  be  possible  to  art ificiall}' 
l)lace  any  sound  at  any  location  in  thrc'c'-dimensional  s])ac('. 

This  thesis  mathematically  models  the*  overall  transfer  function  for  sound  tra\  - 
eling  from  a  source  to  a  listener  in  a  rectangular  room  and  implements  the  model 
electronically  on  a  SU.N'  Sparc  2  Workstation  using  thc'  Kntropic  Signal  Procc'ss- 
ing  System  (ESPS)  designed  by  Entropic  Hcsc'arch  Laboratories  and  copyrighted  in 
1992.  Models  for  this  type  of  simulation  already  e.xist:  however,  research  in  this  area 
is  still  necessary  since  there  is  still  no  rc^al-time  working  system  available  (11)  (  1). 
This  thesis  develops  a  non-real-time  binaural  room  simulation,  and  consecjuently 
does  not  incorporate  head  motion.  Past  research  indicates  that  head  coupling  is 
essential  to  extracranialization  of  synthesized  sound.  Ties  the, sis  attempts  to  im¬ 
prove  upon  existing  auditory  cue  synthesizers  that  do  not  account  for  head  motions 
by  ensuring  that  the  listener  dc  not  e.cperience  the  common  problem  of  in-head 
localization.  With  the  in-head  localization  problem,  a  listener  perceives  the  sound 
as  coming  from  inside  the  headphone's  or  from  some  location  along  the  surface  of  the 
head  (intracranial  lateralization)  as  opposed  to  coming  from  the  distance  simulated 
by  the  electronic  filter  (extracranialization)  (23:173).  .4t  a  recent  coufen'nce  on  Bin¬ 
aural  and  Spatial  Hearing,  the  importance  of  research  in  the  area  of  in-head  versus 
out-of-head  localization  was  emi)hasized  (IS). 


1.^  Definitiontt 

This  section  provides  definitions  of  key  terms  that  will  be  used  in  this  thesis. 

Binaural  sound  is  sound  that  arises  from  two  separate  audio  signals-one  at 
each  ear.  The  signal  that  arrives  at  the  left  ear  is  different  from  the  signal  at  the 
right  ear.  The  brain  uses  the  differences  between  the  two  signals  to  di'termine  the 
direction  and  distance  of  the  source.  When  sound  is  played  through  headphones,  the 
sound  is  binaural  if  the  signal  sent  to  the  left  headi)hone  is  differcml  from  the  signal 
sent  to  the  right  (32:3).  When  binaural  .sounds  are  presented  without  incorporating 
head  motion,  the  listener  normally  can  lateralize  the  sound  intracranially.  Binaural 
sounds  synthesized  in  this  thesis  are  non-head  coupled. 

Binaural  room  simulation  is  a  simulation  which  incor])orates  cues  from  the 
acoustic  environment  into  the  synthesized  signal.  In  this  thesis,  these  cues  are  added 
to  S’  .  n  te  a  sound  traveling  from  a  si)ecific  location  in  a  rectangular  room  to  each 
ear  of  a  listener  at  another  specific  location  in  the  same  room  (23:209). 

Telepresence  is  the  degree  to  which  a  human  perceives  that  he  or  she  is  present 
in  a  natural  environment  even  though  the  environment  is  artificial  or  virtual. 

Virtual  audio  is  synthetically  produced  audio  signals  which  enable  a  listener 
to  acheive  auditory  telepresence.  Virtual  audio  differs  from  binaural  sound  in  that 
listeners  experiencing  virtual  audio  c;in  localize  sound  extracranially  as  opposed  to 
simply  lateralizing  the  sound. 

Extracranializ<^d  sound  is  sound  that  is  pre.sented  binaurally  to  a  listener  and 
is  perceived  as  coming  from  some  distance  from  the  listener -i.e.  outside  the  head. 

Pinna(e)  is(are)  the  human  outer  ear(s).  The  design  of  each  person's  pinnae 
is  unique,  and  each  set  of  pinnae  transform  sounds  differently.  In  fact,  the  human 
head  and  ears  form  an  antenna  system  characteristic  to  the  individual  (33).  This 
system  is  frecjuency  and  angle  dependent. 


3 


Head  related  transfer  function  (HRl’F)  is  the  traiisft*r  iuiutioii  v'hich  accoiiiils 
for  the  filtering  effects  of  the  piima<'.  Because  tlie  filtering  of  sound  b\'  the  pinnae 
is  dependent  upon  direction,  there  is  a  different  fIRTF  for  each  angle  of  incidc-nce. 
The  angle  of  incidence  is  competed  of  an  azitnulh  angle  and  an  elevation  angle. 
HRTF’s  will  differ  for  each  persons  pinnae.  Accurate  localization  of  sound  sources 
can  be  made  by  some  subjects  using  “'someone'  else's  pinnae”  ( 1'2).  In  nearly  all 
cases,  however,  listeners  localize  better  using  their  “own  c'ars”. 

Auditory  Localization  Cues  are  anything  that  allows  a  listener  to  determine  the 
location  of  a  sound  source.  Sc'veral  cues  are  addres.sed  in  this  thesis.  These  cues  are 
Interaural  Time  Delay  (ITD),  luteraural  Intensity  Difference  (IlD),  Pinna  Effects, 
Head  Motion,  Reflections,  Reverberation.  Attenuation,  Sound  Some  Directivitj',  and 
Doppler. 

1.3  Problem 

Auditory  localization  cue  synthesizers  with  head  coupling  produce  extras  ranial- 
ized  sound,  but  the  distance  cue  is  not  controllable.  Systems  which  do  not  account 
for  head  motion  fail  to  produce  extracranialized  localization  of  binaural  sound. 

1.4  Research  Objectives 

This  thesis  develops  a  complete  mathematical  model  representing  the  overall 
transfer  function  for  a  sound  traveling  from  a  source  to  a  listener.  This  transfer 
function  accounts  for  direct  path  and  all  leflected  ])aths  in  a  rectangular  room.  The 
model  is  implemented  electronically  in  the  form  of  a  binaural  room  simulation  to 
generate  a  binaural  signal  that  is  extracranialized 

/ .  5  Scope 

This  thesis  develops  a  binauial  room  simulation  that  enables  a  listener  to 
accurately  determine  the  azimut  h,  elevation,  and  range  of  a  sound  source  presented 


over  headphones  without  head  coupling.  A  inatheinatical  model  is  developed  lor 
an  arbitrary  sized  rectangular  room  with  each  of  th<‘  four  walls,  the  ceiling,  and 
the  floor  having  a  known  absorption  coeflicient.  By  simulating  a  room,  cues  from 
the  acoustic  environment  can  be  included  in  the  binaural  signal.  This  creates  a 
more  realistic  auditory  experience  for  the  listener  and  results  in  extracranialization 
of  synthesized  sound.  The  coefficient  of  absorption  of  the  air  is  represented  by  a 
constant  for  a  given  temperature.  The  sound  source  is  modelled  as  a  point  .source 
emitting  isotropic  sound  and  the  list<‘ner  is  assumed  to  have  two  ears  each  with  a 
characteristic  HRTP'-separated  by  a  given  distance.  The  case  of  a  stationary  source 
and  listener  is  examined.  This  is  not  a  real-time  simulation  due  to  the  extensive 
amount  of  processing  reciuired  to  obtain  the  final  binaural  sound  signals. 

Armstrong  Aerospace  Medical  Research  Laboratories( AAMRL)  provides  the 
URTF  and  ITD  data  used  in  this  simulation  (19).  The  headphones  used  are  intra- 
aural  or  insert  Etymotic  ER-2  headphones:  standard  Walkman  type  headphones  are 
used  on  a  comparative  basis.  Monophonic  sound  recordings  are  made  and  processed 
in  time  using  filters  representing  the  transfer  functions  for  the  head  and  pinna.  The 
signals  are  processed  in  time  using  the  ITD  data.  Furthermore,  the  cues  result¬ 
ing  from  the  acoustic  environment  (reflections,  and  attenuation)  are  included  in 
the  simulation  in  varying  degrees  in  order  to  determine  how  each  cue  affects  the 
extracranialization  of  the  binaural  sound  signal. 

1.6  Approach 

The  approach  to  this  research  problem  consists  of  two  major  parts,  l)mathc 
matical  modeling  of  the  overall  transfer  function,  and  2)electronically  generating  the 
signal  described  by  the  mathematical  model  by  means  of  a  binaural  room  simulation. 

1.6.1  Mathematical  .Approach.  The  literature  will  be  reviewed;  existing 
models  will  be  adopted  into  this  thesis'  model  and  modified  appropriately.  The 


0 


transfer  functions  for  the  headphones  and  the  head  are  obtained  from  direct  mea¬ 
surements.  Both  of  these  transfer  functions  along  with  the  room  transfer  function 
are  combined  to  get  a  single  overall  transfer  function  for  sound  traveling  from  a  point 
source  to  each  of  the  two  ears  of  a  listener. 

1.6.2  Computational  Approach.  Portions  of  the  mathematical  rnodtd  are 
implemented  through  digital  signal  processing  using  a  Sun  Sjjarc  2  workstation  and 
the  ESPS  software.  Filters  designed  from  the  AAMHL  HRTF  data  are  used  to  syn¬ 
thesize  sounds  from  different  angles.  The  transfer  function  for  the  headphones  is 
ignored  in  the  implementation  of  the  mathematical  model  since  the  transfer  func¬ 
tion  of  the  headphones  to  be  used  is  essentially  fiat  (9).  Finally,  reflections  and 
attenuation  are  included  in  the  binaural  room  simulation.  Different  combinations 
are  examined  to  determine  which  factor(s)  enhance  the  extracranialization  of  the 
sound. 

1.7  Thesis  Outline 

The  following  chapter  in  this  thesis  is  a  literature  review  of  many  of  the  topics 
mentioned  in  Chapter  I.  Chapter  III  details  the  mathematical  analysis  done  in  this 
thesis.  Chapter  IV  describes  the  methods  used  for  implementing  the  mathematical 
models  on  the  computer.  Chapter  V  provides  the  results  of  this  research  effort, 
and  Chapter  VI  summarizes  this  thesis  and  presents  recommendations  for  follow-on 
research  in  sound  localization. 


6 


II.  Literature  Revtev> 


‘2. 1  introduction 

In  this  chapter,  a  literature  review  of  three  main  areas  will  be  accomplished. 
The  first  section  will  cover  the  changes  that  occur  in  the  signal  as  it  reaches  the 
receiver;  these  are  the  free-held  cues  to  human  auditory  localization.  Ne.xt,  a  review 
of  changes  in  the  characteristics  of  sound  as  it  propagates  from  source  to  receiver 
will  be  accomplished.  The  focus  in  this  section  will  be  on  room  acoustics  and  the 
changes  tliat  occur  in  a  sound  signal  as  it  travels  through  an  acoustical  environment. 
The  concluding  section  will  review  the  aspects  of  sound  as  it  leaves  the  source;  topics 
include  sound  source  directivity  and  doppler. 

2.2  Receiver  Cues  in  Auditory  Localization 

2.2.1  Introduction  Receiver  Cues.  The  fundamentals  of  human  auditory 
localization  in  the  free-held  consist  of  four  primary  cues.  Two  cues-  interaural  time 
difference  (ITD)  and  interaural  intensity  difference  (IlD)-(Note;  IID  is  also  called 
interaural  pressure  level  difference  (ILD)  in  some  references )-aid  the  brain  in  de¬ 
termining  the  angle  of  azimuth  to  a  sound  source.  ITD  and  IID  emerged  from  the 
duplex  theory  of  sound  localization;  the  critical  element  of  this  theory  is  the  human’s 
use  of  two  ears  to  locate  sound  sources  (39)  (28)  (29).  IID  and  ITD  work  together  in 
providing  the  brain  with  localization  information.  The  third  cue  is  the  effects  of  the 
pinnae;  the  pinnae  accentuate  and  suppress  various  frequencies  as  a  function  of  the 
azimuth  and  elevation  of  the  sound  source.  The  fourth  cue  is  head  motion;  this  cue 
is  believed  to  be  a  cue  used  by  humans  to  disambiguate  front/back  and  up/down 
confusions  (42)  (19)  (20).  The  combination  of  these  four  cues  provides  the  brain  with 
the  necessary  information  to  accurately  localize  a  sound  source  (36:21-22)  (19)  (20). 


2.2.2  Interauml  Time  Difference.  Past  research  has  established  that  iii- 
teraural  time  clifFerence  is  the  primary  localization  cue  in  the  azimuthal  plane  at 
low  frecpiencies  (17:157).  ITD  is  a  binaural,  temporal  cue;  in  general,  research  in¬ 
dicates  that  both  the  monaural  and  binaural  temporal  cues  are  most  salient  in  the 
azimuthal  plane  (21).  Low  frequency  ITD’s  are  generally  larger  than  higher  fre¬ 
quency  ITD’s,  and  therefore  provide  a  stronger  cue  (17:160).  The  exact  frequency  at 
which  ITD  is  no  longer  the  major  contributor  to  localization  is  not  clear,  but  studies 
have  consistently  found  that  this  frequency  is  approximately  1500  Hz.  In  his  study, 
Kuhn  concludes  that  the  trade-off  frequency  where  IID  becomes  more  important  to 
localization  than  ITD  is  1400  Hz.  Furthermore,  Kuhn  finds  that  there  is  steady 
improvement  in  the  ITD  cue  as  the  frequency  is  lowered  from  1400  Hz  to  500  Hz; 
but  below  500  Hz,  there  is  no  improvement  in  the  available  ITD  cue  (17:162). 

In  a  separate  study,  Abbagnaro  et  al  reach  similar  conclusions.  They  find 
that  ITD  is  a  function  of  the  geometrical  interaural  distance  between  the  two  ears, 
and  that  ITD  decreases  gradually  as  frequency  increases  becoming  relatively  con¬ 
stant  at  frequencies  above  1000  Hz  (1:700).  In  research  done  at  AAMRL,  McKinley 
and  Ericson  physically  measured  the  time  differences  of  arrival  for  sounds  of  various 
frequencies.  They  found  that  time  difference  of  arrival  ranges  from  0  to  750  mi¬ 
croseconds  depending  upon  the  angle  of  arrival  in  the  horizontal  and  vertical  planes 
and  the  frequency  of  the  sound  (19:56-63).  In  experiments  which  have  examined 
the  effects  of  ITD,  listeners  wearing  headphones  have  been  pre.sented  with  a  bin¬ 
aural  signal  that  cor  .fined  a  time  delay  to  one  ear.  Subjects  percieved  that  the 
sound  signal  originated  from  the  direction  corresponding  to  the  ear  that  received 
the  signal  first  (31).  These  types  experiments  clearly  established  ITD  as  a  major 
cue  in  sound  localization;  the  following  list  provides  a  summary  of  the  important 
information  concerning  ITD. 

•  Interaural  Time  Difference  is  a  function  of  the  dista.^ce  between  the  two  ears, 

the  angle  of  incidence  of  t'  inromuig  sound,  and  the  frequency  of  the  sound. 


8 


•  ITD  is  frequency  independent  b*‘lovv  approxiniat«*ly  500  Hz  and  ahovt'  approx¬ 
imately  dOOO  Hz;  furthermon'.  l)et\veen  tliese  frequencies  ITl)  decreases  as 
frequency  increases.  For  azimuth  angles  of  incidence  less  than  60  degrees  left 
or  right  of  directly  in  front  of  the  listener  (0  degrees),  the  minimum  ITI)  occurs 
between  1400  and  1600  Hz.  (17:165-166). 

•  The  most  significant  changes  in  Il'D  occur  at  angles  of  incidence  between  0 
and  30  degrees  (1:700). 

2.2.3  Interaural  Intensity  Difference.  second  concept,  which  also  devel¬ 
oped  from  the  duplex  theory,  is  Interaural  Intensity  Difference  (IJD).  Like  ll’D,  HD 
results  directly  from  the  fact  that  humans  have  two  ears  and  is  therefore  a  binau¬ 
ral,  spectral  cue.  Past  research  indicates  that  due  to  head  shadowing,  the  intensity 
of  the  sound  at  one  eardrum  is  different  from  the  intensity  at  the  other  (39:83). 
Furthermore,  IID  becomes  the  primary  cue  in  azimuthal  localization  as  frequency 
increases  (17:162).  In  a  recent  experiment  testing  sound-pressure  levels  in  the  hu¬ 
man  ear  canal,  Middlebrooks  developed  contour  diagrams  which  indicated  areas  of 
high  and  low  amplitudes.  In  this  experiment,  Middlebrooks  is  primarily  interested 
in  the  variation  in  amplitude  of  single  frequency  components  as  a  function  of  sound 
source  location-he  calls  this  the  directionality  of  the  sound  (22:93).  Results  of  this 
experiment  show  that  the  patterns  of  directionality  are  different  at  different  frequen¬ 
cies  (22:97).  Furthermore,  results  suggest  that  the  size  and  shape  of  a  listener’s 
torso  (affects  primarily  low  frequencies),  head,  and  ears  affect  the  amplitude  of  the 
signal  at  different  frequencies  (22:101).  At  a  recent  conference  on  binaural  and  spa¬ 
tial  hearing,  Middlebrooks  emphasized  the  importance  of  spectral  cues  in  localizing 
elevation  angles  as  well  as  in  aiding  the  disambiguation  of  front/back  and  up/down 
reversals  (21). 

In  their  research,  Abbagnaro  et  al  find  similar  results.  They  find  that  at  the 
ear  receiving  the  sound  directly  the  amplitude  response  sharply  increases  at  high 


9 


frequencies  when  the  angle  of  incidence  is  increased:  the  niaxinnim  rise  is  observed 
to  be  at  an  angle  of  incidence  of  60  d<'grees  of  azimuth  which  corresponds  to  when  t  he 
pinna  is  in  approximate  confrontation  with  the  sound  source  (1:699).  Furthermore, 
they  find  that  as  the  head  continues  to  turn  the  respon.se  decreases  due  to  shielding 
by  the  pinna.  At  the  shadowed  ear,  the  response  decreases  until  approximately  120 
degrees  where  it  reaches  a  minimum  (1:700).  The  following  list  is  a  summary  of  tlu' 
important  imformation  concerning  III). 

•  Interaural  Intensity  Difference  is  a  function  of  the  distance  between  tin*  two 
ears,  the  angle  of  incidence  of  the  incoming  sound,  and  the  frequency  of  the 
sound.  IID  can  also  be  aff('cted  by  size  and  shape*  of  the  torso,  head,  and  ears. 
The  torso  affects  primarily  the  low  frequencies. 

•  The  most  significant  changes  in  1  ID  occur  at  angles  of  incidence  between  0  and 
30  degrees,  consistent  with  the  fact  that  humans  perceive  a  sound  best  when 
the  source  is  directly  in  front  of  the  listener  (1:700). 

•  Improvement  in  localization  capabilities  of  humans  at  frequencies  above  3000 
Hz  is  a  result  of  improvement  in  the  IID  cue.  (17:166). 

•  Although  IID  provides  elevation  cues  at  low  frequencies,  at  frequencies  above 
8000  Hz,  IID  cues  substancially  increase  human  capabilities  to  localize  in  ele¬ 
vation  as  well  as  in  azimuth  (22:101). 

2.2.4  The  Effects  of  the  Pinnae.  Under  the  assumption  that  the  head  is 
a  stationary  sphere  with  symmetrically  located  ear  canals  and  no  pinnae,  a  given 
ITD  or  IID  will  place  the  sound  source  on  a  conical  shell  representing  all  possible 
locations  (39:85-86).  In  reality,  however,  the  head  is  not  spherical,  it  does  have 
pinnae,  and  it  does  move,  lb  motlel  the  head  without  these  important  features 
causes  ambiguity  in  the  binaural  signals  presented  to  lis1,eners-i.e.  front/back  and 
up/down  reversals  (39:86).  Sound  arriving  at  each  ear  follows  a  unique  path,  and 
part  of  that  path  includes  the  pinna.  The  pinnae  filter  mid-  and  high-frequencies  of 


10 


incoming  sound  as  a  function  of  angle  of  incidence  (36:22).  The  filtering  propertit's 
of  the  human  outer  ear  significantly  change  the  signal  that  is  processed  hy  the  brain; 
these  changes,  in  fact,  contain  a  great  deal  of  information  about  the  location  of  the 
sound  source  (10:18).  As  a  result,  the  pinnae  effects  when  combined  with  tin*  ITl). 
IID,  and  head  motion  cues  allow  accurate  localization  of  sound  sources.  In  addition, 
the  pinna  cues  seem  to  enhance  the  extracraniaiization  of  the  sound  (30:81). 

Since  the  goal  of  this  thesis  is  to  extracrania  I  ize  .sounds,  the  importance  of 
adding  the  pinnae  cues  is  clear.  There  are  a  variety  of  different  effects  on  sound  as 
it  travels  through  the  pinna  to  the  eardrum.  The  spectral  shaping  that  occurs  as 
sound  travels  this  path  can  be  accounted  for  by  a  group  of  filters  which  are  d<'pen- 
dent  upon  the  incidence  angle  and  frecjuency  the  Head  Related  Transfer  Fuiictions 
(HRTF's)  (39)  (19)  (32).  Measurements  of  head  related  transfer  functions  can  be 
made  by  producing  an  impulse  at  a  specific  location  in  an  anechoic  chamber  and 
measuring  the  output  of  microphones  placed  inside  a  human  subject’s  or  manikin’s 
ears. 

In  1989,  Wightman  and  Kistler  successfully  measured  the  filtering  properties 
of  the  HRTF  using  digital  signal  processing  (41)  (10).  In  an  earlier  experiment, 
McKinley  and  Ericson  successfully  measured  the  HRTF’s  for  272  different  angles 
of  azimuth  and  elevation  (19).  These  HRTF’s  will  be  used  in  this  thesis  effort. 
McKinley  and  Ericson  used  an  acoustically  accurate  manikin-Knowels  Electronics 
Manikin  for  Acoustic  Research  (KEM.Ml)  to  make  their  initial  measurements  of  the 
HRTF’s.  The  orientation  of  the  KEM.AR  manikin  for  the  measurements  is  shown  in 
figure  1  and  figure  2. 

90th  percentile  pinnae  and  .^Oth  percentile  head  and  torso  were  used  in  the 
experiment  (19:21).  A  list  of  the  speaker  locations  u.sed  in  these  measurements  is 
found  in  .Appendix  B,  and  examples  of  the  HRTFs  used  in  this  research  are  found 
in  Appendix  E.  Using  the  lilters  dev<-loped  by  McKinley  and  Ericson.  the  filtering 
effects  of  the  human  outer  ear  can  be  accounted  for  in  t  he  binaural  room  simulation. 


II 


0 


\ 


90 


l.EFTC(KEMARi)RI8HT 

\  t 


270 


\ 


\ 


/ 


/ 


180 


Figure  1.  Orientation  for  Azimuth  Data 


90 


-90 


Figure  2.  Orientation  for  Elevation  Data 


12 


3.2.5  Head  Motion.  Tlie  imporlance  of  h<‘ad  iiiotioii  in  liuinaii  auditory  lo¬ 
calization,  specifically  in  extracranialization  of  sound,  cannot  he  overstatf'd  ( 19)  (‘iU). 
VVitliont  head  motion,  a  person's  head  and  ears  can  still  act  as  a  direc  tional  antenna: 
however,  sounds  located  at  0  degrees  azimuth  and  various  elevations  (i.e.  on  the  me¬ 
dial  plane)  can  be  extremely  difficult  to  localize.  hVont/back  and  up/down  coid'usions 
result  when  head  motion  is  not  allowed;  this  occurs  lu'cause  the  spectral  and  t(Mn- 
poral  cues  are  very  similar  at  both  ears  (12)  (d).  d'he  implementation  of  tliis  cue 
into  a  binaural  room  simulation  requiix's  a  real-time  system  be  used.  For  this  reason, 
head  motion  will  not  be  implemented  into  the  non-real-time  simulation  done  for  this 
thesis. 

2. 2.6  Coiiclu.non  Receiver  Ch<.s.  There  are  many  complex  cues  used  by 
the  human  brain  to  determine  the  location  of  a  sound  source.  Th('  four  primary  cues 
in  the  free-field  are  inter^ural  time  difference  (ITD),  interaural  intensity  difference 
(IID),  effects  of  the  pinnae,  and  head  motion.  ITD  is  the  most  prominent  cue 
at  low  frequencies  while  IID  becomes  more  prominent  at  high  frequencies.  The 
pinnae  filter  mid-  and  high-frequencies  of  incoming  sound  to  provide  the  brain  with 
localization  cues,  and  head  motion  aids  in  localization  on  or  near  the  medial  plane. 
Together,  these  four  cues  act  to  provide  enough  information  to  the  brain  for  listeners 
to  accurately  localize  sound  sources  in  three  dimensional  space. 

2.. 2  Room  Acoustics 

2.3.1  Introduction  Room  Acoustics.  In  a  normal  environment,  it  is  clear 
that  in  addition  to  the  free-field  cues  there  are  many  cues  that  occur  due  to  the  list  en¬ 
ing  environment  or  medium.  Since  current  technology  has  yet,  to  produce  a  binaural 
listening  system  without  head  coupling  which  allows  the  listener  to  extracranialize 
incoming  sounds,  it  is  unclear  what  characteristics  of  the  sound  must  be  added  to 
solve  the  in-head  localization  problem.  C’urrent  research  points  to  several  key  cues 
that  arise  as  a  result  of  room  acoustics;  these  cues  result  from  the  sound’s  inter- 


13 


action  with  the  medium  (a  room  in  this  case)  prior  to  reaching  the  receiver.  .\s 
sound  propagates  in  a  room,  the  changes  that  occur  in  the  signal  that  is  received  aid 
the  listener  in  localizing  sounds.  Therefore,  room  acoustics  will  be  the  focus  of  this 
section. 

3.3.2  Reflections.  An  acoustical  environment  is  defined  by  a  set  of  |diysical 
properties  which  determine  the  changes  that  will  occur  in  a  sound  as  it  excites  this 
space.  The  task  in  a  binaural  room  simulation  is  to  model  the  environment  defined  by 
these  properties  f  18:260).  The  environment  may  range  from  an  anechoic  chamber  in 
which  there  no  reflections  to  a  reverberation  room  in  which  there  are  a  large  number 
of  reflections.  In  an  anechoic  environment,  listeners  localize  sound  sources  very 
well,  but  they  are  unable  to  judge  distance.  In  a  reverberation  room,  listeners  are 
unable  to  localize  sound  sources.  In  a  normal  rectangular  room,  the  environment  lies 
somewhere  in  between  the  two  extremes.  Listeners  receive  direct  source  information, 
but  they  also  receive  a  variety  of  reflected  sound  signals  from  different  direclions- 
each  containing  different  spectral  and  temporal  cues.  The  brain  suppresses  many  of 
the  early  reflections  and  hears  the  array  of  sounds  as  one  entity:  this  is  explained 
by  the  precedence  effect  (13)  (12).  The  information  that  is  contained  in  these  early 
reflections;  however,  contains  information  about  the  listening  environment  (3)  (6). 

The  reflections  must  somehow  be  represented.  The  best  way  of  accounting 
for  the  reflected  sounds  is  to  represent  each  reflection  as  a  reflected  sound  source. 
Assigning  a  linear  filter  to  each  reflected  source  and  delaying  the  signal  approi)riately 
defines  the  spectral  and  temporal  diflerences  specific  to  that  reflected  source.  Due 
to  the  limitations  of  the  human  auditory  system  (spatial  and  temporal  resolution) 
and  the  precedent  .;  effect,  any  sound  field  can  be  simulated  by  a  finite  number  of 
reflected  sound  sources  (18:261-264).  Clearly,  the  more  reflecting  surfaces  in  the 
acoustic  environment,  the  greater  the  number  of  reflected  sound  sources  required  to 
define  its  reflecting  properties.  The  two  common  methods  used  in  determining  the 


location  of  the  reflected  sound  sources  are  the  ray-tracing  nieihod  and  the  image 
source  method. 

2.5.2.1  Ray-  Tiacing  Method  for  Kejleetiotiti.  The  ray  tracing  model 
for  determining  the  location  of  reflected  sound  sources  in  a  room  is  based  on  geo¬ 
metrical  acoustics.  Sound  sources  emit  a  conical  beam  of  rays;  these  rays  pass  in 
straight  lines  through  air  (air  can  be  considered  a  homogeneous,  isotropic  medium) 
and  are  reflected  geometrically  when  striking  a  reflective  smface  (.IT)  (26)  ( 16)  ( 18). 
Each  of  the  rays  that  is  emitted  by  the  source  carries  energy,  and  travels  at  the 
speed  of  sound  (37:173).  The  ray  tracing  technique  consists  very  simply  of  emitting 
rays  from  a  sound  source,  following  their  paths  through  a  room,  and  recording  the 
reflections  off  each  surface  (26:787).  I'sing  this  method,  obstacles  within  a  room  can 
be  considered.  This  leads  to  the  devedopment  of  sophisticated  models  accounting 
for  reflecting  surfaces  (desks,  windows)  and  scattering  surfaces  (screens,  computers) 
within  the  room  (26).  The  model  developed  in  this  thesis  does  not  account  for  the.se 
factors. 

2. 3. 2. 2  Image  Source  Method.  The  image  source  method  (also  called 
the  mirror  image  method)  for  determining  reflected  sources  is  also  based  on  geomet¬ 
rical  acoustics  (18)  (2)  (26)  (5).  The  image  source  method  determines  the  location 
of  a  reflected  sound  source  by  considering  a  point  on  a  reflective  surface  which  rep¬ 
resents  the  sound  field  generated  by  the  original  sound  source  striking  that  suiface. 
This  point  can  also  be  considered  a  field  generated  by  a  secondar}'  (virtual)  source 
determined  by  mirroring  the  original  source  at  the  plane  of  the  reflecting  surface. 
Figure  3  provides  an  example  of  how  a  virtual  source  location  is  determined  using 
the  image  source  method  (5).  The  effects  of  reflective  surfaces  in  a  room  can  then 
be  represented  by  a  set  of  image  sources  symmetrically  placed  with  respect  to  the 
reflecting  surfaces  (24:.367).  This  method  is  ideal  for  the  case  of  simulating  a  rect¬ 
angular  room;  however,  its  use  in  more  complex  simulations  recpiires  a  great  d<'al 


Figure  3.  Calrulating  a  V  irtual  Source  I’sing  the  Image  Source  Method  (Adapti'd 
from  Borish) 

of  computational  effort  (18:265).  Figure  4  shows  a  two  dimensional  slice  of  virtual 
sources  obtained  using  the  image  source  method  in  a  rectangular  room  (5).  In  the 
case  of  a  complex  simulation,  the  ray  tracing  method  is  more  elhcient;  however,  given 
unlimited  computational  effort,  the  results  of  both  the  ray  tracing  and  image  source 
methods  would  be  identical.  For  the  simple  rectangular  room  simulation  accom¬ 
plished  in  this  research,  the  image  source  method  is  used  to  determine  the  location 
of  virtual  sound  sources. 

2.3.3  Reverberation.  When  a  sound  w'avr?  is  generated  in  a  room  and  de¬ 
cays  exponentially  over  time,  this  dying  out  is  called  reverberation.  The  length  of 
time  it  takes  the  sound  to  decay  to  one-millionth  of  its  initial  mean  value  (a  re¬ 
duction  of  60  clB)  is  called  the  reverberation  time  (25:558).  ddie  reverberation  time 
is  dependent  upon  factors  such  as  the  total  absorbtion  of  the  room,  the  volume  of 
the  room,  and  the  speed  of  sound  (25:578-579).  Reverberation  or  “the  late  part’' 
of  a  room  respon.se  does  not  provide  significant  localization  cues  for  a  listener  (18). 
The  reflection  density  in  the  reverberation  component  of  sound  prevents  the  human 
auditory  system  from  identifying  individual  late  reflections;  howevr-r,  the  brain  does 
use  the  reverberant  part  of  sound.  Reverberation  cues  contain  significant  informa- 


16 


t  1 

i 

i 

1  * 

X 

X 

“ ! 

X 

'  « 

X 

» 

“  1 

K 

i 

i 

i  X 

X 

M 

X  1 

Hull 

i 

i 

I 

i  * 

X 

X 

■'  ' . .  '  1 

X 

X 

1 

mm 

mu 

n  1 

X 

i  1 

Figure  4.  Two-Dimensional  View  of  Virtual  Sources  Obtained  Using  the  Image 
Source  Method  in  a  Rectangular  Room  (Adapted  from  Borish) 


Figure  5.  An  Example  of  a  Diffuse  Reflection  Resulting  from  Sound  Contacting  a 
Reflective  Surface  (Adapted  from  Blauert) 

tion  about  room  size  and  spatiousness,  and  these  cues  help  a  listener  to  identif}’ 
coloration  of  sounds  135)  (18).  Methods  do  exist  for  sound  field  modelling  of  re¬ 
verberation  (18:272).  The  technique  used  most  often  is  direct  waveform  generation 
controlled  by  stochastic  parameters. 

2.3.4  Diffuse  Reflections.  One  of  the  specific  effects  that  is  neglected  using 
geometrical  acoustics  is  the  effect  of  diffuse  reflections.  When  a  sound  hits  a  wall, 
sound  energy  is  reflected  in  multiple  directions-not  just  the  oik*  direction  accounted 
for  in  the  image  source  method  (34)  (18)  (27).  Figure  5  provides  an  example  of  the 
diffusion  pattern  that  occurs  when  a  sound  contacts  a  reflecting  surface  ( ]  8:277).  'Fhe 


17 


amount  of  diffusion  and  the  shape  of  the  diffusion  pattern  are  dependent  uj>on  ilje 
characteristics  of  the  reflecting  surface,  the  angle  of  the  reflection,  and  the  frecpiency 
of  the  sound.  These  diffuse  reflections  inaj'  be  accounted  for  in  the  image  source 
method  by  generating  a  cloud  of  virtual  sources  for  each  image  source;  the  cloud 
of  sources  is  determined  from  the  coeflici(‘nt  of  absorption  and  the  coefficient  of 
diffusion  of  the  reflecting  surface.  Diffuse  reflections  are  not  implemented  in  the 
binaural  room  simulation  accomplislu'd  in  this  thesis. 

2.3.5  .Attenuation  and  Phase  Change  Due  to  Reflections.  Another  factor 
that  is  not  directly  accounted  for  in  the  image  source  method  is  the  absorption  of 
sound  that  occurs  at  a  reflective  surface.  The  assumption  that  a  reflecting  surface 
is  rigid  is  not  accurate  when  considering  many  acoustical  environments.  When  a 
sound  strikes  a  reflecting  surface,  there  is  a  transfer  of  acoustic  energy  into  heat 
energy  (35)  (24)  (30)  (25).  The  amount  of  absorption  that  occurs  is  dependent  upon 
the  characteristics  of  the  reflective  surface,  the  angle  of  reflection  and  the  frecjuency 
of  the  sound.  The  absorption  at  a  surface  can  be  measured;  this  measured  (luantity 
is  called  an  absorption  coefficent.  These  coefficients  repres(’nt  the  average  fraction 
of  power  absorbed  by  a  surface  when  sound  is  falling  on  it  (30)  (25).  Absorption 
coefficients  can  be  implemented  into  a  binaural  room  simulation  in  order  to  represent 
attenuation  in  the  sound  signal  due  to  absorption  at  the  reflecting  surface. 

-Another  change  which  occurs  in  the  characteristics  of  a  sound  when  it  is  re¬ 
flected  is  phase  change.  Phase  change  is  not  considered  in  the  mathematical  model 
or  the  binaural  room  simulation  completed  in  this  thesis. 

2.3.6  Attenuation  Due  to  the  Medium.  As  sound  travels  through  any 
acoustic  environment,  it  passes  through  some  medium.  The  sjjeed  at  which  the 
sound  travels  through  the  medium  is  a  function  of  the  medium  itself  (43:30-35). 
Furthermore,  the  amount  of  acoustic  energy  that  is  absorbed  by  1,he  medium  is  de¬ 
pendent  upon  the  characteristics  of  the  medium  (humidity  and  temperature  in  air. 


18 


for  example)  and  the  frequency  of  the  sound  (18:282).  In  this  thesis,  the  approxi¬ 
mation  for  absorption  of  the  sound  due  to  the  medium  (air)  is  found  by  attenuating 
the  sound  by  the  inverse  of  the  distance  traveled  (1/d)  (18). 

2.5.1  Conclusion  Room  Acousfics.  As  a  sound  propagates  in  a  room,  the 
acoustic  environment  that  it  travels  through  shapes  the  sound  that  eventually  arrives 
at  the  listener.  The.se  changes  are  just  as  important  to  a  listener’s  perception  of  a 
sound  as  the  changes  that  take  place  at  the  head.  A  listener  receives  information 
about  the  environment  from  properties  such  as  reverberation,  and  absorption.  Some 
portion  of  the  room  acoustic  cues;  however,  may  be  very  important  in  helping  a  lis¬ 
tener  localize  sounds-in  particular  distance  localization.  The  objective  of  this  thesis 
is  to  determine  which  of  these  properties,  if  any,  aid  in  extracranializing  binaural 
sound  signals. 

2.4  Sound  Source  Cues 

2.4-1  Directivity  of  the  Sound  Source.  In  this  thesis,  the  sound  source 
will  be  considered  a  point  source.  In  nature,  however,  most  sources  are  directional. 
When  a  person  speaks,  for  example,  sound  is  generated  at  different  amplitudes  in 
diffei’ent  directions.  A  listener  standing  behind  the  speaker  receives  a  completely 
different  sound  signal  than  a  listener  standing  in  front  of  the  speaker.  Sound  source 
directivity  is  a  complex  cue  to  model:  directivity  can  best  be  modelled  through  the 
use  of  directivity  fdters  (18:279).  Amplitude  and  phase  characteristics  of  a  sound 
source’s  directivity  can  be  accounted  for  by  these  filters.  Directivity  filters  are  de¬ 
signed  by  measuring  the  emitted  sound  spectra  in  the  direction  of  interest  and  some 
reference  direction,  and  then  dividing  the  results.  The  directivity  filter  is  a  function 
of  frequency  and  location  (18). 

2.4.2  Doppler.  The  change*  in  apparent  frequency  of  a  sound  as  the  source 
moves  past  a  listener  is  termed  the  Doppler  effect  (25:699).  This  effect  occurs  due  to 


19 


the  relative  motion  between  the  source  ami  the  listener.  A  similar  Uo])pler  change'  in 
frequency  results  if  the  listener  is  in  motion  with  respect  to  the  source.  The  Doppler 
effect  is  a  kinematic  phenomenon  and  for  this  reason  motion  by  the  lisle'iier  can  be 
represented  by  an  opposing  motion  of  the  source  (25).  I'he  Doppler  effect  is  not 
included  in  th('  binaural  room  simulation  since  only  stationary  source  and  listener 
are  considered. 

2.5  Conclusion 

Localization  of  a  sound  source  by  the  human  auditory  system  is  accom|)lished 
through  identification  of  differences  in  the  spectra  and  times  of  arrival  of  sound 
signals  at  each  ear.  Spectral  shaping  of  the  sound  begins  from  the  moment  the  sound 
signal  is  emitted  from  the  source.  Source  directivity  and  relative  motion  ])lay  key 
roles  in  the  sound  that  arrives  at  the  listener.  As  the  .sound  travels  to  a  listener,  the 
acoustic  environment  continues  to  shape  and  delay  portions  of  the  sound.  Reflected 
sounds,  reverberation,  and  attenuation  of  the  sound  present  the  listener  with  a  sound 
signal  in  t  he  free-field  which  is  drastically  different  from  the  emitted  sound.  Head 
motion  and  filtering  by  the  pinnae  provide  additional  spectral  and  temporal  cues. 
Finally  the  fID  and  ITD  information  is  interpreted  by  the  brain.  Accurate  sound 
localization  in  a  binaural  room  simulation  requires  that  t  he  listener  receives  sufficient 
psychoacoustic  cues.  Many  of  these  cues  are  complex  and  change  as  a  function  of 
frequency,  angle,  distance,  size,  and  even  the  type  of  material  involved. 


20 


III.  Mathematical  Analysis 


3. 1  Introduction 

This  chapter  contains  a  mathematical  analysis  of  the  localization  cues  available 
to  a  listener  in  a  rectangular  room.  A  transfer  function  for  a  sound  traveling  from  a 
source  to  each  ear  of  a  listener  in  a  rectangular  room  is  developed.  Fhe  sound  source 
is  modeled  as  an  isotropic  point  source,  and  the  receivers-the  left  and  right  ears  arr* 
treated  as  point  receivers.  The  primary  emphasis  of  the  analysis  is  modelling  room 
acoustics:  however,  free-held  cues  (ITD,  IlD,  and  pinna  effects)  are  also  included  in 
the  model. 

3.2  Modelling  the  Delay  Due  to  Distance  Traveled 

The  analysis  begins  by  chosing  a  pmint  in  three-dimensional  space  to  be  the 
origin.  The  origin  will  be  dehned  as  the  point  where  wall  2.  wall  3,  and  the  floor 
intersect  (See  Figure  6.)  A  monophonic  sound  signal  generated  from  a  point  source 
located  at  A'’,4  =  {xa-,  Va-,  ^a)  within  the  room  will  be  denoted  by  ).  The  subscript 
.4  denotes  that  this  source  is  the  actual  sound  source.  Suppose  this  signal  then 
travels  to  a  point  receiver  located  at  Xp  =  {xp,  yp,  Zp)  w'ithin  the  room.  The  signal 
measured  at  Xp  is  different  from  the  signal  that  was  generated  at  and  will  be 
denoted  by  s(t,  Xa,  Xp).  Notice  that  s{t,  .Xa,Xa)  is  just  the  original  signal  .s.4(/). 
Taking  the  Fourier  transform  of  the  original  signal  results  in  the  following: 

TsA(t)  =  5,4(0;) 

where  u  =  ‘Inf  denotes  the  frequency  in  radians j second  and  5,4(0;)  is  the  original 
signal  produced  at  A^,4  represented  in  the  frequency  domain.  This  signal  does  not 
arrive  instantaneously  at  the  receiver.  I'he  time  <lelay  (r)  of  the  signal  measured  at 
Xp  is  a  function  of  the  distance  traveled  by  the  sound  {Dap)  divided  by  the  s])eed 


WALL  a 


WALI.  4 


Figure  6.  Orientation  of  the  Rectangular  Room  Used  in  the  Simulation 


of  sound  (r).  Thus, 


s{i^XA,  Xf>)  =  —  t) 

where  r  =  Dap/c  and  Dap  —  dtst\.\)>.  Xa)  —  jj-Vp  —  A.^lj  is  the  Euclidean  dis¬ 
tance  between  Xp  and  .V4.  In  the  frequency  domain,  the  appropriate  time  delay  is 
modelled  as  a  modulation,  that  is: 

Xa,Xp)  =  S(uj,  Xa.Xp)  =  SAiu:)exp('^^^^^  )  ( 1 ) 

c 

In  the  case  of  a  sound  traveling  from  a  source  to  a  listener,  there  are  two 
receivers-the  left  and  the  right  ear.  Each  ear  will  be  modeled  as  a  point  receiver. 
The  location  of  the  left  ear  is  given  by  the  position  vector  A/,  and  the  position  of 
the  right  ear  is  given  by  Xp.  The  location  of  the  ears  depends  upon  the  location  of 
the  center  of  the  head  and  the  orientation  of  the  listener’s  head  with  respect  to  the 
sound  source.  In  order  to  determine  the  location  of  a  listener’s  ears  in  free  space, 
a  new  coordinate  system  must  be  defined  which  represents  the  listener’s  viewing 
angle.  The  new  coordinate  system’s  origin  lies  on  the  midpoint  of  the  line  segment 
connecting  the  the  left  and  right  ears.  This  point  is  defined  as  the  listener’s  location 
and  is  represented  by  Xc  =  (-rc^yc^  -c)-  The  subscript  C  denotes  the  center  of  the 
listener’s  head.  The  new  coordinate  s\'stem,  shown  in  Figure  7,  is  related  to  the 
original  coordinate  system  by: 


fi  =  (111 bjj  +  C]l' 

(2  =  (I2I  +  b2j  +  c^A- 

f 

such  that  {ci,  62,73}  is  an  orthonormal  set.  That  is,  f\,  -7^  =  0  for  a  ^  /T  and 
cr,i3  E  {1,2,3},  and  ||7a||  ==  1  for  a  6  {1,2.3}.  Therefore,  there  exists  a  relationship 
between  the  scalars  {fl„.6„.c„|o  =  1,2.3}.  Furthermore  ?,  j.  and  A:  are  unit  vectors 


in  the  x,y,  and  c  directions,  rf'spectively,  and  tlie  axis  represents  the  direction 
the  nose  is  pointing.  A  single  matrix  then  defines  the  orientation  of  tlie  listener’s 
head,  with  respect  to  the  room.  For  example,  the  following  matrix  represents  the 
case  where  the  listener  is  looking  directly  in  the  /  direction: 

I  0  0 
0  1  0 
(J  0  1 

Since  the  listener’s  ears  lie  on  the  axis,  the  left  and  right  ear  positions  can 
be  identified  by  the  following  vectors: 

A’/.  =  .Vc  +  (d/2)e, 

A'r  =  AV-(d/2)e2 

where  d  is  the  length  of  the  line  segment  connecting  the  two  ears. 

Given  the  position  of  both  ears,  the  distance  from  any  sound  source  to  either 
or  both  ears  can  be  determined.  Given  the  actual  source  location  A'^i,  the  distance 
to  the  right  ear  Dah  given  by: 

=  dist(Xi:i,  Xa) 


=  dist{C  -[d/2](,.XA) 


For  the  example  above  where  the  listener  is  looking  directly  in  the  i  direction,  cj  = 
h  ^2  =  i-  «nd  in  =  k\  hence  =  0,  6^  =  1.  and  C2  =  0,  so  that: 

Oah  =  dist(Xn,  .\.4)  =  J(.rc  -  •'’,0^  +  (Vc  —  [d/'2]  -  ija)^  +  {^c  - 


2rj 


The  distance  to  the  left  ear  is  d('terinined  using  A'/,. 

Given  this  distance,  the  signal  at  the  right  ear.  accounting  for  time  delay  due  to 
distance  traveled,  is  given  by  substituting  Xu  for  A/^  and  Dah  for  D^/dnto  Equation 
1,  that  is, 

J^s{t.XA,Xn)  =  A.,,  A«)  =  .y.,(-')oxp(— (2) 

3.3  Determining  the  Angle  of  Incidence 

The  incirlence  angle  on  th<'  receiver  (the  listener's  riglii  ■  ai  )  is  also  a  function 
of  the  listener's  viewing  angle  and  the  position  of  the  center  of  the  listener’s  head  in 
free  space.  Given  the  coordinate  system  defined  in  the  previous  section,  the  position 
and  orientation  of  the  head  can  be  defined.  Now  the  angle  at  which  the  sound  arrives 
at  the  listener  must  be  determined. 

In  dealing  with  the  angle  of  incidence,  the  azimuth  angle  (0)  will  range  from  — tt 
to  TT  (  — TT  <  0  <  n)  and  is  measured  with  respect  to  the  (c),  ey.  f  .j)  coordinate  system. 
The  nose  is  0  =  0.  Angles  on  the  listener's  right  (the  —t2  direction)  are  positive; 
this  orientation  is  shown  in  Figure  8.  Elevation  angle  (<;>)  ranges  from  — 7r/2  to  7r/2 
(  — 7r/2  <  <j)  <  7r/2),  and  is  also  measured  with  respect  to  the  coordinate 

system,  so  that  ep  =  w/2  corresponds  to  the  direction,  and  0  =  — 7r/2  corresponds 
to  the  — ca  direction.  This  is  identical  to  the  orientation  shown  in  figure  2.  A  vector 
from  the  right  ear  to  the  actual  sound  source  is  given  by  XA  —  Xfi,  and  then  corrected 
to  the  center  of  the  head  by  adding  ((//2)c2  to  get: 

Ao  =  A'^  -  A,j  +  (d/2)l2  =  Xa  -  Xc 


26 


0  Degrees 


Figure  8.  Orientation  for  Azimuth  in  the  Mathematical  Model 


27 


The  subscript  D  denotes  diff'uence.  Now  the  projection  of  this  vector  onto  the  cjt  2 
plane  is  defined: 


•^D,euh  —  —  (-^D  '  <  1  )<  1  +  {-^D  '  f 


This  vector  may  point  in  positive  or  negative  0  direction  depending  on  the  location 
of  the  virtual  sound  source;  this  must  be  considered  when  determining  riierefore. 
the  following  function  is  defined: 


; 

-1 


if  •  (2  <  0 

if  IJ.fi. ei  ■  ^2  =  0 
It  ^l).ei.(2  ■  t  2  ^  0 


Now  0  is  calculated: 


0  =  &iXA.Xc.S)  =  /(.Vo./,.f2)Arccos 


-Vp.ei.c;  •  ei 


where  £  —  {ei,e2,e3},  and  Arccos  is  the  principle  branch  of  the  arc-cosine  function. 
Similarly,  the  elevation  angle  (<^)  is  calculated: 


0  =  Aci-)  =  I  -  Arccos 


3.4  Head  Related  Transfer  Function 

The  head  related  transfer  function  (HRTF)  denoted  by  H  was  discussed  in 
detail  in  Chapter  2.  Recall  that  the  HRTF  is  a  function  of  freciuency  (u.’).  source 
position  =  (.T.4, //.i, -4),  and  the  point  receiver  position  .Xp  =  {.rp,yp,zp).  It  is 
also  a  function  of  viewing  angle  defined  by  £.  For  any  given  source  location,  there  is 
a  separate  HRTF"  for  each  ear.  For  the  purposes  of  this  analysis,  only  the  right  ear 
is  addressed  (analysis  for  the  left  ear  is  identical).  Given  that  the  point  receiver  is 
the  listener’s  right  ear  and  0  and  (p  are  detc'rmined  as  shown  in  the  previous  section, 


28 


a  head  related  transfer  function  for  the  riglit  ear  at  these  angles  can  be  used  to  lilt«M' 
the  incoming  sound  from  the  actual  sound  source.  In  fact,  the  MRTF  is  measured 
with  respect  to  these  angles  6  and  (j).  In  particular,  \i  II n(u: ,  6 .  o)  ilenotes  the  HR  l  F 
of  the  right  ear  for  a  given  frecpiency  a,’,  azimuth  0,  elevation  o,  and  head  viewing 
direction  £  then: 


Ifn{u,^.X,,Xr,£)  =  //fl(u.-.0(.V4.AV.5).<D(.V,4,  AV,5)) 

defines  the  HRTF  in  terms  of  location  vectors.  Tlii.s  transfer  function  accounts  for 
the  filtering  effects  of  the  pinnae.  Multiplying  the  HRTF  for  the  right  ear  and  the 
delayed  sound  signal  in  Equation  2: 

Hn(u,\X A,  Xc .  £ ) ^4 ( u.’ )  ex p(  )  ( 3 ) 

c 

produces  a  directional  monophonic  signal  for  the  direct  path  from  the  actual  source 
to  the  right  ear.  When  presented  simultaneously  with  the  signal  for  the  left  ear,  the 
result  is  a  binaural  signal  filtered  to  account  for  the  HRTF  data  and  time  delays 
(including  the  ITD  if  its  frequency  dependence  is  ignored).  Tliis  is  the  direct  path 
sound  to  the  listener. 

3.5  Headphone  Transfer  Function 

In  general,  the  headphone  transfer  function  is  a  function  of  frequency.  Foi-  the 
Etymotic  insert  headphones  used  in  this  thesis,  the  transfer  function  is  assumed  to 
be  constant  per  the  manufacturer’s  data.  As  is  tlse  case  throughout  this  analysis,  the 
phase  portion  of  the  transfer  function  is  ignored.  In  o1  h('r  words,  t  he  output  of  a  given 
input  sound  is  attenuated  equally  in  amplitude  for  all  frequencies.  If  the  transfer 
function  is  not  constant,  it  can  still  be  measured.  In  this  thesis,  the  headphone 
transfer  function  is  assumed  to  be  id«*ntical  for  each  earpiece  and  is  denoted  /^(u;). 
In  many  cases,  however,  the  headphone  transfer  function  is  different  for  each  ear 


(Fl{u;)  and  would  be  used  it'  tliis  were  the  case).  The  inverse  of  tlie  headphone 

transi'er  function  can  be  multiplied  by  E<|nafion  '■]: 


1  l‘^'  1)  -X  11 

S{^\XA.Xn)  =  ——Hh{^\X^.Xi-.S)Sa(^')v\\)( - — )  (1) 

F{u.')  c 

producing  a  directional  inono[)honic  signal  for  the  right  ('ar  which  accounts  for  the 
filtering  effects  of  the  head  and  pinna,  and  cancels  the  effects  of  the  headphones. 

ii.6  Room  Transft  r  Function 

The  room  transfer  function  is  composed  of  various  (dfects.  Reflections,  and 
attenuation  are  the  key  portions  of  this  transfer  function.  The  room  being  modelled 
in  this  research  is  composed  of  four  walls,  a  floor,  and  a  c(‘iling.  Again,  the  orientation 
of  the  room  is  shown  in  figure  6. 

3.6.1  Modelling  Reflections.  If  a  single  reflec’ing  surface  (wall)  is  assumecl 
to  be  present,  a  virtual  .source  placed  symmetrically  outside  the  wall  represents  the 
imaginary  source  of  the  reflected  sound  off  that  wall.  This  is  a  direct  result  of  the 
image  source  model  discussed  in  Chapter  2.  The  sound  signal  at  the  listener's  right 
ear  is  now  composed  of  two  parts: 

.Sa(t^')exp( - )  +  .h,i(u,-)exp( - ) 

c  c 

where  Dvn  is  the  distance  from  the  right  ear  to  the  virtual  source,  that  is  Dyn  = 
dist{Xv,  Ar). 

In  the  case  of  a  room  with  four  walls,  a  floor,  and  a  ceiling,  an  infinite  number 
of  reflections  are  possible.  Each  of  these  reflections  is  represented  by  a  virtual  source 
located  in  one  of  an  infinite  number  of  virtual  rooms  located  in  three  dimensional 
space.  An  arbitrary  sound  source  location  is  represented  by  the  position  vector 
A’(/,m,,i)  and  is  located  in  the  (/,m, ??.)  virtual  room  (l,m,n  E  Z).  The  original  sound 


30 


source  location  is  denoted  by  A (00,0)1  ^md  A ( 1,0.0)  'i’  virtual  source  located  iii 
the  virtual  room  adjacent  to  the  actual  room  in  the  [)ositive  ./■  direction.  I'dgure  9 
depicts  how  the  virtual  rooms  are  arranged  around  1  he  actual  room.  Sound  generatc'd 
from  a  source  located  at  A'(o,o,o)  produces  an  infinite  number  of  rcdlections  in  a 
rectangular  room.  The  sound  field  generated  by  the  actual  sound  source  can  be 
modelled  as  an  infinite  number  of  virtual  sources  in  free  space.  Let  A'  denote  the 
collection  of  the  original  sound  source*  and  all  virtual  sound  source  locations.  That 
is.  A'  =  {.\(/,ni,n)KA'b G  Z}.  Given  this  notation.  th<*  signal  at  the  listener's  right 
ear,  accounting  for  all  reflections  and  their  corresponding  time  delays,  can  now  be 
represented: 

5(a',  A',  A'/?)  =  5.4(0,’)  ''^P(  ) 

;  r: 

i.ni,n=— cc 

where  /^(;,,n,n)/<  is  distance  from  the  right  ear  to  the  A''(/,m,rj)  source,  that  is 
D{t,m,n)R  =  <-iis<(A’(/,m,rt)i  A'fl).  Given  that  the  azimuth  and  elevation  of  each  of  the 
virtual  sound  sources  can  be  determined  from  the  equations  in  section  3.3,  the  model 
can  now  be  extended  to  account  for  the  filtering  effects  of  the  pinna  on  each  reflection. 

CO 

A',  A"r)  =  5,4(0,’)  Y. 

where  //r(u;,  A'(;, A'c, 5)  is  the  HRTT  corresponding  to  the  angle  of  the  A'(;,„i,„) 
sound  source  for  the  right  ear,  and  Xq  is  considered  a  function  of  Xr.  The  same 
calculations  could  be  performed  on  the  left  ear.  Presenting  the  resulting  two  signals 
simultaneously  to  the  appropriate  ears  results  in  a  binaural  sound  accounting  for  all 
reflections  appropriately  delayed.  Attenuation  due  to  distance  traveled  and  due  to 
absorption  at  the  walls  has  yet  to  be  considered. 

3.6.2  Modelling  .AttenuoUon.  Recall  from  Chapter  2.  that  as  a  sound 
propagates  in  a  room,  the  sound  is  attenuated  due  to  absorption  by  1)  the  medium 
through  which  it  travels  (air)  and  2)  the  surfaces  from  which  it  is  reflected  (the  walls. 


D{l,m,n) 


flRiui,  .K[l,m,n)^  Xc,£) 


31 


Figure  9. 


Orientation  of  the  Virtual  Rooms  Around  the  Room  Containing  the 
Sound  Source 


32 


ceiling,  and  floor).  The  attenuation  due  to  absorption  by  the  medium  of  a  sound 
traveling  from  A'.4  to  is  a  function  of  the  distance  the  sound  travels  {Dar),  and 
the  properties  of  the  medium  (co).  The  properties  of  the  medium  include  items  such 
as  composition  (oxygen,  nitrogen,  etc.),  humidity,  and  temperature.  Including  this 
attenuation  as  the  original  signal  travels  directly  to  Xp  results  in  the  following; 


.V/i)  =  S',4(tj) 


ro  exp[?:(u,'/c)/Tifi]' 


47r/T: 


iR 


which  represents  the  sound  at  the  right  ear  prior  to  filtering  by  the  pinna.  In  the 
simulations  completed  in  this  thesis,  eo/4^  is  assumed  to  equal  1.  Again,  this  model 
could  be  extended  to  account  for  all  reflections,  and  the  filtering  by  the  appropriate 
HRTF’s;  however,  first  the  attenuation  by  the  walls  will  be  included. 

Absorption  coefficients  for  hundreds  of  different  materials  can  be  found  in 
the  literature  (14).  The  sound  absorption  coefficient  (a)  is  directly  related  to  the 
coefficient  of  reflection  (/?)  by  the  following  equation  (2): 


Q  —  \ 


Like  a,  /?  is  a  function  of  frequency  and  angle.  In  this  thesis,  3  is  considered  to 
be  the  same  for  all  reflective  surfaces.  Given  this  relation,  attenuation  caused  by 
absorption  by  the  wall  can  be  accounted  for  by  filtering  appropriate  fequencies  by 
where  /?(),,„,„) (u;)  filters  the  X^i.rn,n)  source  h  times  (h  =  |/|  +  |m|  +  |7?|). 

3.7  Overall  Transfer  Function 

The  final  step  of  this  analysis  is  to  put  the  pieces  of  the  previous  sections 
together.  The  overall  signal  to  the  right  ear  is  described  by: 

S(u;,X,Xfi)  =  ^--^^-^■•'5’/i(u’)  (6) 


33 


X 


/,r/i.a  =  —  X' 


D 


Conclusion 


Given  the  overall  transfer  function  for  an  isotropic  sound  signal  traveling  to  a 
receiver  through  a  rectangular  room,  a  separate  transfer  function  can  be  calculated 
to  each  ear  as  follows: 


47rP(u-’) 


=  — OC 


and, 


(7) 

'exp[?(u-'/c)D(,.„,.,.)Z^]\ 


D 


{l,m,n)L 


A’,  Afi,  f)  = 


/,m.n=— oc' 


(8) 

'exp[f(u;/c)D(/,,n,n)K]\ 


£) 


(l,m,n)R 


where  A^r,  denotes  the  location  of  the  hd't  ear.  A  binaural  sound  can  now  be  presented 
to  a  listener  by  simultaneously  playing  <'ach  sound  to  the  appropriate  ear.  This  model 
is  the  foundation  for  the  binaural  room  simulation  completed  in  this  thesis. 


31 


IV.  Methodology 


4-1  Introduction 

The  mathematical  model  developed  and  presented  in  ('hai)ter  3  represents  tlu' 
completion  of  the  first  research  objective  of  this  thesis.  Tin*  second  research  obj('c- 
tive,  as  stated  in  Chapter  1,  was  to  iinplement  portions  of  the  mathematical  model 
electronically  in  order  to  generate  a  l)inanral  signal  that  can  be  extracranialized.  The 
process  of  adding  the  effects  of  different  portions  of  the  model  into  a  simple  binaural 
room  simulation  can  be  broketi  down  into  essentially  eight  separate  parts: 

1.  Setting  the  dimensions  of  the  room,  placing  the  sound  source  and  tin- 
listener  at  specific,  locations  in  the  room,  and  determining  the  angle  of 
incidence  and  distance  to  the  listener  from  the  sound  source  and  chosen 
virtual  sources. 

2.  Determining  the  angles  to  be  used  in  the  design  of  the  Head  Related 
Transfer  Function  (HRTF)  filters  from  the  angle  of  incidence  data. 

3.  Designing  the  FIR  filters  (from  the  AAMRL  HRTF  data)  for  both  the 
left  and  right  ears  for  all  appropriate  angles. 

4.  Recording  a  monophonic  signal  and  filtering  the  recorded  sound  with  each 
of  the  HRTF  filters  to  yield  a  collection  of  monophonic  sound  files. 

5.  Processing  the  monophonic  sound  files  to  account  for  ITD  and  time  delay 
to  the  listener  due  to  the  distance  the  sound  travels. 

6.  Processing  of  the  signals  already  including  time  delay  information  to  ac¬ 
count  for  attenuation  of  the  .sound  due  to  distance  traveled  from  source 
to  listener. 

7.  Combining  of  appropriate  monophonic  sound  files  to  yield  a  collection 
of  binaural  sound  files  each  containing  different  degrees  of  free-field  and 
room-related  effects. 


8.  Varying  the  ll'D  cue  to  match  I  he  size*  of  the  listener's  lieacl  diametc'r. 

This  chapter  will  first  rc'view  c*ach  of  the  steps  that  were  tak«*n  in  this  hinaural 
room  simulation  in  order  to  generate  the  final  binaural  signals.  Finally,  the  mc'thod  of 
determining  whether  or  not  a  particular  sound  could  be  accurately  localizc'd  and/or 
extracranialized  will  be  discussed. 

4.2  Determining  the  Anglt  of  Incidence  and  Distana  to  tin  l/istunr 

4.2.1  Room  Dimension.^t.  Using  the  mathematical  model  developed  in 
Chapter  3,  a  MathCad  template  was  developed.  Using  this  template,  specific  room 
dimensions,  sound  source  locations,  and  listener  location  and  orientations  were  se¬ 
lected  and  inputed  into  the  file.  The  template  allowed  for  any  rectangular  shaped 
room.  For  the  simulation  performed  in  this  thesis,  the  room  dimensions  were  se¬ 
lected  to  be  15  meters  in  width  (.r-axis),  20  meters  in  length  (/y-axis),  and  5  meters 
in  height  (c-axis). 

4.2.2  Listener  Location  and  Orientation.  The  listener  could  be  placed  at 
any  location  in  the  room  and  his  head  could  be  oriented  in  any  direction.  For  each 
of  the  seven  simulations  performed  in  this  thesis  the  listener  was  located  in  the  same 
positon  and  orientatioii-(.55Tj-,  .75Ly,  .IbLj)  always  looking  in  the  ?  direction  at 
Wall  1  with  ears  parallel  to  the  .ry  plane.  Given  this  information,  location  of  the 
left  and  right  ear  were  pinpointed.  'I'his  detailed  ear  location  was  not  critical  to 
the  simulation  run  in  this  thesis;  however,  knowledge  of  tlu;  exact  location  of  each 
ear  may  possibly  be  used  in  follow-on  research  which  would  include  the  use  of  a 
head-tracker  and  the  effects  of  head  motion. 

4.2.3  Sound  Source  Location.^.  The  actual  sound  source  could  be  placed 
anywhere  inside  the  room.  The  sound  sources  for  simulating  reflections  are  placed 
outside  the  room-virtual  sound  sources  determined  by  the  image  source  method. 
Given  a.  single  sound  source  location,  virtual  source  locations  representing  the  re- 


flections  off  each  of  the  four  walls,  the  floor,  and  the  ci'iling  were  generated  by  the 
template.  Seven  different  original  source  local iotis  were  run  in  this  thesis  research: 

1.  (.58L,..8.Uy,.20/.J 

2.  {.7bL,.,.S0Ly,.95L,) 

3.  (.28Z.,,.7r)L„,.15L,) 

4.  {A8Ljr,.(ioLy,:20L,) 

5.  (.48Lx,.85L,,.0lA,) 

6.  {.75Ljc,-7SLy,.l7tL.) 

7.  {.5SL^..6oLy,.20L,) 

Si.x  virtual  source  locations  were  determined  in  each  of  the  seven  cases  above;  these 
virtual  sources  corresponded  to  the  first  reflection  off  each  of  the  four  walls,  the 
floor,  and  the  ceiling.  The  seventh  case  in  this  list  included  ten  reflections.  In  this 
case,  four  additonal  virtual  sources  were  calculated;  these  sources  represented  the 
following  four  double  reflections: 

•  Reflection  off  the  Floor  then  Wall  1 

•  Reflection  off  the  Floor  then  Wall  3 

•  Reflection  off  the  Floor  then  the  Ceiling 

•  Reflection  off  the  Ceiling  then  the  Floor 

Given  the  actual  sound  .source  location  and  the  virtual  .soimd  source  locations, 
the  user  need  not  make  any  more  inputs;  the  computations  were  done  by  the  Math- 
Cad  program. 

4.2.4  MathCad  Template  Output.  Once  room  dimensions,  listener  location 
and  orientation,  and  sound  source  location  had  been  input,  the  program  output 
could  be  viewed.  The  template  was  designed  to  give  the  angle  of  incidence  to  both 


the  left  and  the  right  ear  for  all  sound  sou rces- actual  and  virtual,  rids  angle  was 
given  in  two  parts;  first  the  azimuth  angle  was  reported.  Second,  the  elevation  angle 
was  reported.  Finally,  program  output  provided  the  acoustit  path  length  to  left  and 
right  ear  from  each  of  the  sound  sources.  .A  sample  of  tin*  .\laih( Aid  output  can  be 
found  in  Appendi.x  A. 

Dtt(  rmining  the  AngUti  to  I'sf  for  I'iltfr  Dtsigu 

As  was  stated  in  Cha|)t<T  2.  lh<‘  HRI  F  data  to  be  used  in  this  thesis  was  col¬ 
lected  at  the  .Armstrong  Aerospace  .Medical  Research  baboratory  (A.AMRL)  located 
at  Wright- Patterson  Air  Force  Base  (VV  PAFB).  The  data  collection  was  accomplished 
in  an  anechoic  chamber  containing  a  dome  of  272  speakers.  .A  list  of  the  speaker 
locations  by  angle  of  azimuth  and  elevation  can  be  found  in  Appendix  B.  Because 
of  the  limited  number  of  speaker  locations,  a  perfect  match  to  the  incidence  angles 
obtained  from  the  MathCad  template  was  not  likelj'.  For  this  reason,  approximat  ions 
were  necessary.  The  group  of  tables  located  in  Appendix  C  shows  the  approximate 
angles  of  azimuth  and  elevation  given  in  the  MathCad  out{)ut.  and  the  angles  cor¬ 
responding  to  the  speaker  used  to  design  the  HR'l'F  filters  for  each  of  the  seven 
simulations.  Speaker  locations  were  chos«Mi  by  determining  which  speaker  provided 
the  closest  approximation  to  the  azimuth  and  elev^ation  angles  reported  in  the  Math- 
Cad  output.  The  speaker  locations  ami  corresponding  ITDs  are  shown  in  Appendix 
B. 

Given  that  these  were  the  speaker  locations  cho.sen.  the  next  step  in  the  sim¬ 
ulation  was  to  take  the  raw  data  for  «'ach  of  the  selectcxl  speakers  and  develop  FIR 
filters  which  would  represent  the  HR'J'F's  for  the  corresponding  angles  of  incideiux'. 

4.4  Designing  the  Filters 

The  filter  design  for  this  room  simulation  was  accomplished  using  a  Sun  Sparc 
2  workstation  and  the  ESPS  software.  The  “wmseTilt"  command  was  u.sed  in  the 


3S 


design  of  all  left  and  right  ear  filters.  The  handedgt's  and  responses  were  determined 
from  the  A.XMRL  raw  data  and  a  weighting  funetion  of  1  was  emi)loved  for  each 
band.  The  manual  pages  for  this  ESPS  commaiul  as  well  as  for  all  KSPS  commands 
used  in  this  thesis  effort  can  be  fouml  in  .\j)peudi.':  ('.  1  he  filters  dc'signed  using 
this  command  were  93  tap  FIR  filters.  Examples  for  both  the  left  and  right  ear 
can  be  found  in  Appendix  IJ.  The  fillers  w«*re  generated  using  tfie  data  taken  from 
measurements  on  a  KfiMAR  head  and  bust.  The  sampling  fn'quency  was  40  kHz, 
and  the  weighting  function  was  constant  at  1.  A  left  and  right  ear  filter  were  designed 
for  each  of  the  chosen  speaker  locations. 

T5  Recording  and  Filtering  a  Monophonic  Signal 

This  step  of  the  research  was  also  accomplished  using  the  binaural  mixing 
console  mentioned  in  the  previous  section.  In  this  cas('.  the  specific  commands  used 
were  “s32crecord”  and  “filter”.  The  manual  pages  for  these'  commands  are  found 
in  Appendix  C.  The  first  command.  “s32crecord” ,  allows  for  the  recording  of  a 
monophonic  signal  using  an  Ariel  A/D  converter.  For  the  simulation  carried  out  in 
this  thesis,  tiie  recorded  sound  was  male  speech,  and  the  sampling  frequency  of  the 
recording  was  40  kHz. 

Once  a  sound  data  file  was  created,  the  mono|)honic  sound  file  was  filtered 
with  each  of  the  previously  designed  filters  using  the  “filter”  command.  For  six  of 
the  seven  simulations,  the  result  of  this  step  wa.s  14  .sound  files -seven  left  and  seven 
right.  The  seven  files  are  the  result  of  the  direct  path,  the  four  wall  reflections,  and 
the  reflections  off  the  floor  and  ceiling.  In  the  seventh  case,  four  additional  virtual 
sources  representing  double  reflections  were  included,  so  the  result  was  22  files.  Each 
of  these  fifes  contained  a  monophonic  speech  signal  which  had  been  filtered  to  account 
for  the  HRTF,  but  the  signals  still  needed  to  be  appro|)riately  delayed. 


39 


4-6  Accounting  for  ITD  and  Time  Delag  Due  to  Distance  Traveled 

4.6.1  TTD.  As  was  stated  in  Chapter  2,  Il  l)  is  a  critical  cue  in  sound 
localization-particularly  at  frequencies  helow  30L)0  Hz.  Clearly,  this  cue  had  to  Ije 
included  in  the  filtered  sound  files  in  order  to  generate  localizable  binaural  sound  files. 
ITD  is  a  function  of  angle  of  incidence  and  frecpiency.  I’he  frequency  dependency  of 
this  cue,  although  genuine,  would  be  very  coinple.x  to  model  and  research  indicates 
that  it  is  not  a  critical  cue  in  sound  localization  (12)  (19).  In  this  simulation,  the 
frequency  dependence  of  the  ITD  cue  is  ignorc'd,  and  IIT)  is  assumed  to  be  constant 
for  a  given  angle.  The  ITD’s  used  in  this  thesis  were  mc'asured  at  AAMKL  using  a 
KEMAR  manikin  with  a  head  radius  of  S.T)  cm.  The  measurements  can  be  found  in 
Appendix  A.  The  ITD  for  each  source  was  found  from  the  data  in  Appendix  B  and 
then  added  to  the  sound  corresponding  to  the  ear  furthest  from  the  .sound  source. 
The  delay  was  added  into  the  sound  files  using  the  ESPS  command  “delay” (See 
Appendix  C). 

4.6.2  Distance  Time  Delay.  In  addition  to  the  delay  added  for  ITD,  both 
left  and  right  ear  signals  had  to  be  delayed  for  the  distance  traveled  by  the  sound. 
The  distance  traveled  was  calculated  by  averaging  the  left  and  right  ear  distances 
given  in  the  MathCad  output  discussed  previously.  The  calculated  distance  was 
divided  by  the  speed  of  sound  (344m/s^  for  room  temperature  with  dry  air)  to 
obtain  the  time  delay  resulting  from  the  distance  traveled.  This  delay  was  added 
into  all  of  the  sound  files  using  the  “delaj'”  command. 

4. 6. 3  Adding  the  Delayed  Sounds  to  Simulate  Reflections.  The  delayed 
files  now  needed  to  be  combined  to  begin  simulating  reflections.  In  this  step  left  and 
right  ear  sound  files  were  d(!alt  with  separately.  Using  the  ESPS  command  “addsd” 
the  direct  path  for  the  left  ear  was  added  to  the  Wall  I  reflection  for  the  left  ear. 
The  result  was  a  sound  file  that  contaiiuxl  information  for  the  direct  path  and  one 
reflection.  Next,  this  direct  plus  one  reflection  file  was  added  to  the  Wall  2  reflection 


10 


for  the  left  ear  yielding  a  file  containing  direct  path  and  two  reflections  lor  the  left 
ear.  This  was  done  again  for  the  Wall  '.i  reflection,  the  Wall  1  reflection,  the  I  loor 
reflection,  and  the  Ceiling  reflection,  rids  same  procedure  was  followed  for  the  right 
ear  as  well.  The  result  was  again  14  files  of  interest  .seven  left  and  seven  right.  Each 
file  contained  appropriate  time  delays,  accounted  for  HRTF  data,  and  contained 
varying  degrees  of  reflection  information. 

4.7  Accounting  for  Altenualion  Due  to  Distanct  Trareltd 

In  Chapter  3,  the  mathematical  model  accounting  for  sound  travel  through  a 
room  included  attenuation  in  the  sound  resulting  both  from  the  distance  it  traveled 
through  the  air  and  for  absorption  at  walls  from  which  the  sound  is  reflected.  If 
we  assume  the  walls,  ceiling,  and  floor  an*  all  marie  from  the  same  type  of  material, 
the  attenuation  due  to  absor|)tion  at  the  walls  may  be  factored  out  of  all  but  the 
direct  path  sound.  The  effect  of  this  att<'nuation  is  small  when  compared  to  the 
attenuation  in  the  sound  from  all  sources,  including  the  direct  path,  resulting  from 
the  distance  traveled  through  the  air.  .As  a  sound  moves  from  a  source  to  a  listener 
it  is  attenuated  proportional  to  the  reciprocal  of  the  distance  traveled. 

This  attenuation  can  be  implemented  by  using  a  floating  point  scaling  factor 
found  in  the  delay  command  (scale  option).  Using  the  scale  option  the  14  sound 
files  mentioned  in  section  4.6.2  (before  the  sounds  are  added  to  simulate  reflections) 
were  appropriately  scaled  to  yield  14  sound  files  containing  HRTF,  time  delay,  and 
attenuation  information.  These  files  were  added  to  simulate  reflections  in  the  same 
procedure  described  in  section  4.6.3. 

4.8  Creating  a  Binaural  Signal 

Using  the  ESPS  command  ”mux"’  (which  is  simply  the*  command  name)  two 
sound  files  can  be  combined  in  a  single  sound  file  containing  a  left  and  right  ear  signal. 
When  played  back  through  an  .Ariel  D/A  converter  and  stereo  amplifier  using  the 

41 


cominand  ‘'s32cplay”  these  coinbiued  files  are  binaural  a  separate  signal  is  heard  at 
each  ear.  Using  this  procedure,  the  monophonic  sound  liles  obtained  in  section  1.6.3 
and  section  4.7  were  combined  to  yield  14  binaural  sound  files  seven  containing  time 
delay  information  and  no  attenuation,  and  seven  containing  both  time  delay  and 
attenuation  information.  The  files  all  contained  various  numbers  of  reflections,  and 
HRTF  data.  These  14  .sound  files  (22  in  one  case)  were  now  available  for  each  of  the 
seven  simulations.  These  sound  files  were  used  to  make' judgements  on  which  factors 
did  and  did  not  affect  the  ext  racranialization  of  the  binaural  signals. 

^.9  Increasing  the  Head  Size 

In  an  effort  to  increase  the  extracranialization  of  the  sounds,  a  final  step  was 
added  to  each  simulation.  4'he  ITD  cue  was  increased  l)y  25,  50.  and  100  percent  to 
simulate  an  enlarging  of  the  listetier’s  head.  The  resulting  signals  contained  identical 
information  to  those  described  in  section  4.8  with  the  exception  of  the  increased  delay 
to  the  ear  furthest  from  the  sound  source.  The  increased  I  fD  cue  was  included  in 
the  direct  path  and  all  reflections.  The  results  of  this  adaptation  are  also  found  in 
Chapter  5. 

4-10  Conclusion 

This  chapter  has  laid  out  the  process  for  accomplishing  the  binaural  room 
simulation  for  generating  a  signal  that  can  be  extracranialized.  Using  the  Sun  Sparc 
2  workstation  and  the  tlSPS  software,  signals  were  appropi-iately  filtered,  delayed, 
attenuated,  added,  and  combined  to  generate  a  collection  of  binaural  sound  files. 


V.  Infor  mal  Subjective  Iinpiesfnons 

5.1  Introduction 

This  chapter  details  tlie  results  obtained  from  the  binaural  room  simulation 
discussed  in  Chapter  4.  These  residts  were  obtained  from  three  sources.  1)  an  ini¬ 
tial  investigation  through  the  playing  of  a  comparitive  group  of  sounds.  2)  further 
investigation  through  the  playing  of  a  wide  variety  of  s()Unds  in  a  doudle  blind  e.x- 
periment,  and  3)  informal  tests  to  determine  the  affects  of  chac^ing  head  size.  The 
factors  manipulated  in  each  of  these  investigations  w('n'  angle  of  incidence,  number 
of  reflections,  attenuation  of  the  refh'ctions.  and  m)  (in  the  increasing  of  the  virtual 
head  size). 

5.2  Results  of  the  Initial  Investigation 

When  the  initial  investigation  be'gan,  the  coJlection  of  sounds  consiste^d  of  42 
binaural  sound  files.  The  breakdown  of  the  sound  files  was  as  follows: 

•  From  285  degrees  of  azimuth  and  10  degrees  of  elevation,  there  were  7  .sounds 
containing  HRTF  and  time  delay  data.  These  souiuls  ranged  from  direct  path 
to  direct  path  plus  all  single  reflections. 

•  Also  from  285  degrees  of  azimuth  arid  10  degrees  of  elevation,  there  were  7 
sounds  containing  HRTF,  time  delay,  and  attenuation  data.  These  sounds 
ranged  from  direct  path  to  direct  path  plus  all  single  reflections. 

•  From  240  degrees  of  azimuth  and  10  degrees  of  elevation,  there  were  7  sounds 
containing  HRTF  and  time  delay  data.  These  sounds  ranged  from  direct  path 
to  direct  path  plus  all  single  reflections. 

•  Also  from  240  degrees  of  azimuth  and  10  degrees  of  elevation,  there  were  7 
sounds  containing  HRTF,  time  delay,  and  attenuation  data.  These  sounds 
ranged  from  direct  path  to  direct  path  plus  all  single  reflections. 


•  From  18U  degrees  of  azimuth  and  U  degrees  of  elevation,  ttiere  W(‘re  7  sounds 
containing  IIR  TF  and  time  delay  data.  These  sounds  ranged  from  dir('(  t  path 
to  direct  path  plus  all  singh*  rtdh-ctions. 

•  Also  from  180  degrees  of  azimulh  and  0  degrees  of  elevation,  there  were  7 
sounds  containing  HITTF,  lim<'  delay,  and  attenuation  data.  These  sounds 
ranged  from  direct  path  to  direct  path  plus  all  single  reflections. 

Using  these  stimuli,  listeners  w('re  asked  to  make  some  forced  choice  paired 
comparisons.  Thes;  comparisons  were  accomplished  in  order  to  gain  some  cpialita- 
tive  information  that  would  he  hel])lul  in  a  more  d(*t ailed  experiment.  The  infor¬ 
mation  was  collected  through  verbal  res|)onses  by  tlu'  subjects.  During  this  initial 
investigation,  responses  from  the  listeners  indicated  that  there  was  not  a  significant 
difference  between  listening  to  the  sounds  through  the  insert  headphones  and  the 
Walkman  headphones.  Furthermore,  listener's  preferr<‘d  the  comfort  of  the  Walk¬ 
man  type  headphones.  Therefore,  the  Walkman  headphones  were  used  for  all  further 
investigations  (the  transfer  function  for  tliese  headphones  was  not  flat).  The  sound 
files  for  285  degrees  of  azimuth  were  listened  to  first,  histeru'rs  were  made  aware 
that  the  sounds  were  all  coming  from  the  same  direction,  and  tlu'y  were  asked  to 
verbally  state  differences  noted  when  listening  to  a  series  of  sounds. 

5.2.1  Comparisons  for  Addition  of  Rrfltc lions  wilhout  .‘Mltnualion.  The 
first  collection  of  sounds  that  was  played  was  : 

•  Direct  Path 

•  Direct  Pat’  ,  Reflection  off  Wall  1 

•  Direct  Path,  Reflections  off  Wall  1  and  Wall  2 

•  Direct  Path,  Reflections  off  Wall  1,  Wall  2,  and  Wall  3 

•  Direct  Path,  Reflections  off  Wall  1,  Wall  2,  Wall  3.  and  Wall  4 

•  Direct  Path,  Reflections  off  W^alls  1,  2,  3.  4.  and  the  Floor 


14 


•  Direct  Path,  Heflectioii^  olF  Walls  1,  2.  3,  1.  and  the  Floor  and  Ceiling 

All  of  the  listeners  wenc'  ahle  t»)  localize  the  direct  path  scaind  signal  which 
included  URd  F  and  Il'D  data.  l/isten<'rs  lend<‘d  to  hiteralize  t  he  sound  in  t  h('  f)ro|)er 
quadrant:  however,  most  listcuiers  did  not  extracranialize  the  sound.  I’Ik'  addition  of 
a  single  reflection  into  the  sound  brought  a  mixture  of  if'sponses  from  the  listeners. 
Lateralization  in  azimuth  and  elevation  was  similar  to  the  din'ct  path  case,  but 
listeners  definitely  noted  a  change.  Some  subjects  reported  that  the  sound  was  just 
outside  the  head,  while  otlnus  reported  in-head  localization.  With  the  addition  of  a 
second  reflection,  all  listeners  could  tell  the  sound  was  coming  from  their  right,  but 
some  could  not  lateralize  the  source  as  well  as  the  direct  path  alone.  The  addition  of 
a  third  reflection  resulted  in  siiuilar  comments  from  the  listeners.  In  both  cases,  more 
listeners  reported  localizing  the  sound  slightly  outsich'  of  the  head.  The  addition  of 
four  or  more  reflections  into  the  sound  tile,  however,  resulted  in  declining  accuracy 
in  lateralization  of  the  sound.  Subjects  began  to  localize  on  multiple  sound  sources, 
and  in  most  cases  the  subjects  could  not  localize  the  source  at  all.  Furthermore, 
including  greater  than  three  reflections  caused  the  listeners  to  hc>ar  echoes  or,  as 
some  stated,  to  hear  ”in  stereo'*.  Since  tlie  reflections  presented  to  the  listeners  in 
this  case  were  not  attenuated,  they  did  not  sound  natural  and  the  brain  attempted 
to  localize  the  source  of  the  reflections  as  well  as  the  original  source. 

5.2  2  Comparisons  with  Atlctmnkd  Reflections.  After  listening  to  the  first 
collection  of  sounds,  listeners  were  asked  to  listen  to  seven  different  sounds.  These 
sounds  were  played  in  the  same  order  (h'scribed  in  the  previous  section:  howiw'er, 
this  group  of  files  accounted  for  the  attenuation  in  the  sound  due  to  the  distance 
traveled  through  air. 

Comments  from  the  listiuiers  for  the  direct  path  case  were  similar  to  those  made 
for  the  unattenuated  direct,  path.  Lateralization  in  azimuth  and  elevation  was  again 
in  the  proper  quadrant,  and  listeners  reported  in-hea<l,  on-head,  and  slightly  out- 

15 


of-head  localization.  The  addition  of  one  reflection  caused  very  little  change  in  the 
listeners'  responses;  however,  some  did  report  that  the  sound  source  seemed  farther 
away.  When  a  third  reflection  was  adtled,  more  listeners  e.xperienced  extracranial- 
ization  of  the  sound,  and  localization  in  azimuth  and  elevation  was  still  good.  The 
addition  of  four,  five,  and  six  reflections  resulted  in  similar  comments.  Nearly  all  lis¬ 
teners  reported  out  of  head  localization  when  four  or  more  reflections  were  included 
in  the  sound  signal.  Furthermore,  none  of  the  listeners  reported  hearing  echoes,  and 
localization  of  the  sounds  was  still  in  the  proper  cjuadrant. 

5.'2.S  Comparisons  Between  Attenuated  and  Unnltenuated  Reflections.  In 
the  final  step  of  the  preliminary  investigation,  the  listeners  were  asked  to  compare 
the  quality  of  different  pairs  of  sounds.  The  orders  of  all  the  comparisons  were 
alternated  and  listeners  received  the  same  pair  of  sounds  multiple  times.  One  of  the 
first  comparisons  made  was  direct  path  (non-attenuated)  to  direct  ])ath  (attenuated). 
Interestingly,  many  listeners  reported  that  the  attenuated  direct  path  signal  sounded 
farther  away  than  the  non-attenuated  direct  path  signal.  Listeners  also  reported 
the  perception  of  increased  distance  as  the  number  of  reflections  was  increased. 
Numerous  combinations  were  compared  with  the  most  interesting  result  being  that 
when  three  or  more  attenuated  reflections  were  included  in  the  sound  signal,  all 
listeners  perceived  this  signal  being  farther  away  than  the  non-attenuated  direct 
path  signal. 

5.3  Double-Blind  Experiment 

5.3.1  Purpose  and  Scope.  'Lhe  emphasis  of  this  thesis  was  not  placed  on 
performing  a  detailed  human  factors  analysis  of  the  final  product  of  this  research; 
however,  a  simple  experiment  was  accomplished.  The  purpose  of  this  experiment 
was  to  gain  more  knowledge  about  what  factors  affect  extracranialization  of  sounds 
and  to  avoid  bias  of  the  experimenter.  The  experiment  and  inlerpretation  of  its 
results  were  used  primarily  to  draw  some  broad  conclusions  about  the  effects  of 


certain  aspects  of  the  room  simulation  on  a  listetier.  After  llie  initial  investigation, 
groups  of  14  sounds  were  developed  for  four  additional  locations  (See  (’haj)ter  4)  and 
four  double  reflections  were  included  in  the  sound  fih's  for  285  degrees  of  azimuth. 
This  brought  the  number  of  binaural  sound  files  to  106.  Twenty-five  sounds  were 
selected  from  the  106;  the  selection  of  the  sounds  was  purpo.seful.  For  e.xample,  since 
the  purpose  of  this  c.xperinieut  was  to  determine  what  cues  cau.se  the  .sound  to  be 
e.xtracranialized  by  the  listener,  many  of  the  sounds  containing  a  large  number  of  non- 
attenuated  reflections  were  not  included.  From  the  initial  investigation,  these  sounds 
were  determined  to  be  difficult  to  localize  and  unrealistic.  The  sounds  selected  for 
the  test  are  found  in  Table  1.  These  sound  files  w’ere  randomly  assigned  a  code  word 
and  then  randomly  ordered  so  that  the  experimenter  did  not  know  wdiich  sound  was 
being  presented  to  the  listener,  and  therefore  could  not  bias  the  subjects  response. 
.After  the  experiment,  the  experimenter  matched  the  code  words  tothe  appropriate 
sounds  and  determined  the  location  of  each  sound  that  was  presented  to  each  subject. 

5.5.2  Ref:ults.  The  results  of  this  experiment  strongly  supported  the  results 
of  the  initial  investigation.  Fifteen  subjects  were  asked  to  listen  to  the  collection  of 
sound  files  listed  in  Table  1.  Five  subjects  listened  to  all  25  sounds.  The  other 
ten  subjects  listened  to  a  portion  of  the  collection  due  to  time  constaints.  Between 
7  and  13  subjects  listened  to  each  .sound.  Listener’s  lateralized  the  azimuth  and 
elevation  of  the  sound  sources  well  when  only  the  direct  path  source  was  presented. 
Some  listeners  did  experience  front/back  reversals,  but  the  same  listener’s  continued 
to  ex{)erience  front /back  reversal  regardless  of  the  number  of  attenuated  reflections 
idded  into  the  signal.  Fp/down  confusions  were  less  frecpient  with  localization  higher 
than  intended  being  the  primary  problem.  The  results  in  extracranialization  of  the 
sounds  were  also  similar.  The  (juantitative  results  of  this  exp(>riment  are  shown  in 
Table  2.  The  "A"  in  the  table  stands  for  accurate,  the  ’T”  for  inaccurate,  and  the 
"F"  for  unable  to  localize. 


47 


Table  1.  Location  of  the  Sounds  used  in  the  Double-Blind  Experiment 


48 


Table  2.  Responses  for  the  Double-Blind  Experiment 


19 


When  only  the  non-attenuated  <lirect  patli  was  presented  to  the  listeners,  oidy 
about  20  percent  reported  that  they  lu'ard  the  sound  outside  their  head.  When  the 
direct  path  was  attenuated,  this  i)erc(“ntagc  rose  to  60  percent,  .\pproximately  half 
of  the  listeners  reported  hearing  eclux's,  and  were  unable  to  localize  signals  that 
included  only  two  non-attenuated  reflections.  This  percentage  of  listeners  unable 
to  localize  the  .sound  due  to  echoes  or  "'stereo'’  rose  to  nearly  90  percent  wln'ii  four 
non-attenuated  reflections  were  included.  When  the  reflections  were  attenuat('d; 
however,  the  number  of  listener’s  reporting  localization  outside  their  heads  increa.sed. 
With  four  attenuated  reflections,  approximately  80  percent  of  the  listeners  reported 
localization  extracranially.  With  six  or  more  atteiiuated  reflections  included  in  the 
signal,  90  percent  localized  extracranially. 

Although  the  listeners  reported  hearing  the  sound  outside  their  heatl  when 
attenuated  reflections  were  included  in  the  signal,  the  perception  of  the  distance  to 
the  source  varied  from  1  to  20  feet.  Some  subjects  indicated  that  the  volume  of  the 
signal  provided  information  about  the  distance.  Furthermore,  they  stated  that  not 
all  of  the  signals  appeared  to  be  at  the  same  volume. 

5.4  Increasing  the  Virtual  Head  Size 

The  ITD  cue  was  increased  by  2.').  50,  and  100  percent  in  each  of  the  previously 
generated  sound  files,  and  6  subjects  listen<*r  to  the  new  sounds.  .A  very  interesting 
result  was  obtained.  Listein’is  that  localized  .sounds  extracranially  with  the  original 
ITD  cues  did  not  receive  a  recognizable  benefit  from  the  increased  H  D  cue.  However, 
some  of  the  listeners  that  had  trouble  extracranializing  the  original  sounds  were  able 
to  extracranialize  the  sounds  containing  increased  ITD  cues.  Not  coincidently.  these 
listeners  did  have  slightly  larger  than  average  head  sizes. 


50 


5.5  Sumniary  of  Hesult.'^ 

The  results  of  this  thesis  research  are  encouraging.  .Analysis  of  the  binaural 
sound  signals  generated  in  the  binaural  simulation  show  that  a  computer  generated 
binaural  signals  presented  over  headphones  without  head  coupling  can  be  e.xtracra- 
nialized.  In  particular,  the  addition  of  attenuated  re-flections  in  a  binaural  room 
simulation  is  key  in  increasing  the  liste-ne-r's  extracranialization  of  synthe-sized  sound. 
Localization  in  azimuth  and  elevation  is  not  noticably  improve-d  with  the  addition 
of  these  same  cues,  furthermore,  the  distance  betwfx-n  a  listener's  ears  appe-ars  to 
be  an  important  factor  in  extracranialization;  conse(|uently,  the-  ITDs  nee-el  to  be 
adjusted  to  account  for  substantial  eliffere-nces  in  he-ad  size. 


VI.  Conclu.Hions  and  RecoTnmendation.H 


6.1  Summary 

During  the  course  of  this  tliesis  work,  two  research  objectives  wer<'  focused 
upon.  1)  the  develojunent  of  a  mathematical  model  to  describe  the  transfer  function 
for  a  sound  traveling  from  a  source  to  a  receiver  in  a  rectangular  room,  and  2)  the 
generation  of  binaural  sound  signals  that  can  beextracranialized  by  a  listener  wearing 
headphones  without  head  coupling.  In  short,  this  research  centered  on  adding  cues 
described  in  the  mathematical  model  to  a  sound  file  using  a  Sun  Sparc  2  Workstation 
and  the  ESPS  software.  The  result  was  a  binaural  room  simulation  which  yielded 
sounds  that  could  be  extracranialized  by  most  listeners. 

In  the  first  step  of  the  simulation,  a  monophonic  sound  file  was  created.  The 
sound  file  contained  approximately  three  seconds  of  male  speech.  Once  the  sound 
file  was  generated,  it  was  then  filtered  to  account  for  the  HRTF  for  the  direct  somid 
source  and  various  virtual  sound  sources  representing  reflections.  Next,  the  existing 
sound  files  were  appropriately  delayed  to  account  for  the  ITD  cue  and  the  tra\’el 
distance  to  the  listener.  When  these  sound  files  were  presented  simultaneously  to 
the  appropriate  ears  of  a  listener,  the  resulting  binaural  signals  ranged  from  direct 
path  to  direct  path  and  all  single  reflections  (one  ca.se  included  the  first  four  double 
reflections).  Sounds  which  accounted  for  three  wall  reflections  or  less  could  be  latf'r- 
alized  intracranially  by  most  listeners.  However,  when  four  or  more  reflections  were 
included  in  the  sound,  listeners’  ability  to  localize  the  sound  source  deteriorated. 
In  most  cases,  regardless  of  the  number  of  reflections  added,  the  listeners  failed  to 
extracranialize  the  sound. 

The  next  step  in  this  research  was  to  include  the  effects  of  attenuation  resulting 
from  the  sound  traveling  through  air.  Once  t  he  attenuation  was  added  into  the  sound 
files,  the  files  were  presented  simultaneously  to  listeners.  The  resulting  binaural 
sound  files  again  ranged  from  direct  path  to  direct  path  and  all  single  reflections; 


however,  the  addition  of  attenuation  into  these  files  created  a  drastically  diffenuit 
affect  on  the  listener.  Once  again,  localization  was  in  the  proj)er  quadrant.  In  this 
case,  however,  listeners  reported  e.xtracranialization  of  the  sound.  In  fact,  when 
only  the  direct  sound  source  was  included,  listeners  perceived  the  signal  containing 
attenuation  to  he  farther  away  than  the  signal  that  contained  only  HRTF  and  tiiiu' 
delay  information.  Furthermore,  the  more  attenuated  reflections  (up  to  10)  that 
were  included  in  the  sound  file,  the  greater  the  perceived  distance. 

6.2  Conclusion's 

The  mathematical  model  developed  in  Chapter  .1  provides  an  accurate  descrip¬ 
tion  of  the  transfer  function  for  a  sound  traveling  from  a  source  to  a  listener  in  a 
rectangular  room.  Furthermore,  the  implementation  of  pieces  of  this  model  into  a 
binaural  room  simulation  in  particular  tin;  addition  of  virtual  sound  sources  to  ac 
count  for  all  single  reflections,  and  the  attenuation  of  all  the  sound  sources  due  to 
the  distance  traveled  through  air-proved  to  be  an  effective  means  of  providing  the 
listener  with  an  extracranialized  sound.  The  implementation  of  these  cues  into  the 
sound  signal  definitely  increased  the  degree  of  telepresence  re'ported  by  the  list(>ner. 
Localization  in  the  azimuthal  plane  was  only  slightly  improved  over  signals  contain¬ 
ing  no  attenuation  information  HRTF  data.  Many  of  the  listeners  were  less  accurate 
in  localizing  in  elevation.  .\  majority  of  the  mistakes  came  from  listeners  localizing 
the  sound  higher  in  elevation  than  intended.  Also,  listeners  encountered  difficulty 
in  localizing  sounds  on  or  near  the  m<*dian  plane.  This  difficulty  persisted  regard- 
le.ss  of  the  number  of  attenuated  reflections  included.  Overall,  the  goal  of  creating 
an  extracranialized  binaural  signal  was  accomplished  by  incorporating  the  effects  of 
attenuation  into  reflections  represented  by  virtual  sound  souices. 


6.3  Recommt  ndatioiis  for  Future  Rtsfurch 

This  thesis  successfully  demonstrated  that  a  binaural  reborn  simulation  can  be 
used  to  generate  binaural  signals  that  listeners  can  Uxalize  extracraiiially  without 
head  motion.  However,  the  product  of  this  resc*arch  is  far  from  a  final  product.  There 
are  several  research  projects  that  could  lx*  performed  as  follow-on  research  to  this 
thesis. 

One  of  the  biggest  problems  with  the  method  used  in  this  research  was  the 
enormous  amount  of  processing  time  and  memory  space  n’(|uired  to  generate  the 
final  binaural  .signal.  This  extensive  proce.ssing  of  sounds  clc'arly  did  not  allow  for 
any  real-time  applications.  A  real-time  system  that  could  incorporate  head  motion 
and  include  the  necessary  attenuatc'd  reflections  would  be  an  excellent  topic.  The 
dev'elopment  of  a  real-time  system  that  included  head  motion  should  be  effective  in 
eliminating  the  problems  of  localizing  on  the  median  plane  as  well  as  the  front/back 
reversals. 

Several  other  follow-on  projects  to  this  research  exist.  One  project  would  be  to 
test  both  left  and  right  ear  dominant  listeners  to  determine  if  ear  dominance  affects 
localization  of  the  binaural  sounds  generated  in  this  thesis.  .\  test  for  right  and  hTt 
ear  dominance  was  implemented  at  AFIT  in  the  Fall  of  1993  (7).  .A  more  extensive 
study  of  the  effects  of  head  size  on  localization  could  also  be  very  informative.  The 
goal  of  such  a  study  might  be  to  develop  a  set  of ’’head  sizes”  that  can  be  selected  by 
a  listener  for  the  best  fit.  The  use  of  IIR  filters  in  place  of  the  FIR  filters  used  in  this 
research  for  the  HRTF’s  might  provide  a  means  of  speeding  up  the  pre-processing 
time.  This  could  be  a  step  in  reaching  the  goal  of  a  real-time  virtual  audio  system. 


O'] 


Appendix  A.  MathCad  Output 

The  Mathcad  output  on  the  following  pages  was  used  in  the  simulation  per¬ 
formed  at  approximately  285  degrees  of  azimuth.  Note  that  the  notation  is  not 
identical  to  that  used  in  the  development  of  the  mathematical  model,  but  the  con¬ 
cepts  are  used  are  the  same. 


The  following  Mathcad  program  calculates  the  distance  traveled 
from  a  sound  source  to  a  receiver  for  a  direct  path  and  six  reflections 
in  a  room.  The  angle  of  incidence  at  the  receiver  is  also  calculated. 

Width  of  the  room(W)  in  meters  w  15 

Length  of  the  room(L)  in  meters  L  20 

Height  of  the  room(H)  in  meters  H  5 


Source  Position 


Receiver  Position 


X  .58W  Y  .65L 


^=.55W 


Z  .20  H 


^  -.7SL  C  ISH 

I? 

R 


Let  eF,  eL,  and  eU  be  coordinate  vectors  of  unit  length 
where  the  following  combination  indicates  that  the  receiver 
is  looking  straight  ahead  at  wall  1. 

a|  =1  bj  -0  =0  32  =0  b2  "I  ^2 

33=0  b3  =0  C3  =  1 


a2l 

“3 

II 

bi 

CL  = 

^2 

®U  = 

•^3 

_C2j 

Left  Ear  Position  Ehtl  -  R  +  (.085  ejJ 

Right  Ear  Position  Ear  ^  -  R  -  (.085  e 
Direct  Path: 

Distance: 


Left  Ear:  1 - 

Dis‘lj0  ^)V[(ii.b2-.085)  -  Yj"^(C~Z)" 

EHsIlo  =2.1476 

Right  Ear:  r - - 

DistRO  =J(^  X)V|  (t]  b2-.085)  y)^  (C~  Z)^ 


Dist  =  1 .983 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 


Left  Ear:  Azimuth 

US  =S  EaTL  PLSePd.  (LS  ep)  ep^  (LS  ejj-eL 
eftL)  if(PLS  ep^L-e  l<0.  1 .  l) 


0UO  -  f(PLS  gp^)  acos 


*^eFeL  ®F 
l^eFeLi  ; 


eLo=  77.8208  “deg 


Elevation 


n  /^^U) 


(^Lo  =6.6849  *deg 


Right  Ear:  Azimuth 

RgtS  S-Earjj  PRgtSgpy^  (RgtS  epj  ep^  (RglS  cj^  CL 
gp^)  ^  if(PRgtS  gpa^  e  L<0. 1.1) 


0RO  -f(PRg6gPeL) 


iPRgtS  eFeL  ®?' 
acosj  - - 


0  RO  =  76.7762  ‘d^ 


Elevation 


,  _n  /R«*S-eu\ 

♦  R0=2-nTR^li 


<l>R0  =  7.2427 'deg 


One  Reflection  off  Wall  1: 


Distance: 


s 


W^(W  -  X) 
Y 
Z 


Left  Ear:  i - 

X))^  [(ii  t  b2-.085)  -  y|V  (C-  Z)^ 

DistLi  =13.2179 

Right  Ear:  i - - 

T)ist^l  W  [(n  b2-.085)  y]V(C  zf 

DistRi  =13.1921 


0( 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 


Left  F-^r;  Azimuth 


LSI  S  2  Ear L  PLSl  gpy^  (LSI  ep 

f(PLSleFeL)  if(PLSl  eFeL  eL<0- 1-  0 

®L1  "f(PLSlep^)acos| 

PLS*  eM.  ® 

|P^i  eM.  1 

Elevation 

,  n 

<p  r  1  -  acos 

^  2  \  iLSl  , 

II 

-e- 

0L1  =9.0774 -deg 


Right  Ear:  Azimuth 

RgtSl  Sj  Ear^  PRgtSl  (RgtSl  Cp)  ep  (RgtSl 
f(PRgtSl  ePeL)  if(PRgtSlep^  eL<0.1.  l) 


0R1  -f(PRgtSlgpeL)acos 

Elevation 


|*^8^1eFeL|  / 


0R1  =8.3482 ‘deg 


^ 


'RgtSl -eu^ 

jRgtsir/ 


.I.R1  =1.08.S9-deg 


One  Reflection  off  Wall  2: 


Distance: 


S2 


X 

Y 

Z 


Left  Ear: 


Dist 


L2 


-J(^-X)^+f(il-b2-.085).  Yj^  (C 


zr 


r)istL2  =28.0897 

Right  Ear:  r -  - 

DistR2  =J(^  -X)^;(Ti-b2-.085)  -Y^^  (C  zf 

Dist  R2=  27.9197 


■)8 


Incidence  Angle:(theta  is  a/imuth  and  phi  is  elevation) 


Left  Ear:  Azimuth 


LS2  S  2  Ear  L  PLS2  ^  (LS2  e  p)  e  p  (LS2  e  p)  e  p 

f(PLS2ei^)  if(PLS2ej^.ep<0,I,  l) 


PLS2  epep-e  p' 


01^  =f(PLS2^)acos  0j^=89.O82-deg 

\  :PE5»-^eFeL  / 

Elevation 

7L  /LS2-eu\ 

•t*  u  2  ips2  )  ^  =0.5099 -deg 


Right  Ear:  Azimuth 


RgtS2  S  2  Ear  PRgtS2  gp^  (RgtS2  e  p)  e  p  *  (RgtS2  e  i)  e  p 
f(PRgtS2gpgp)  if(PRgtS2gp^  ep<0.1,  l) 


/PRgtS2  gp^ep\ 

0  R2  -  f(PRgtS2  ei^)  acos  — -  0  r2  =  89.0765  -deg 

\  Pl^gt62  gp^l  / 


Elevation 

n  /Pg^2-epf 

^  2  IRgtS2i  , 


4iR2=0.513*deg 


One  Reflection  olT  Wall  3: 


Distance: 


S,  Y 


Left  Ear: 


)istp3  J(^  i  xf  r  1  (n  ^b2'.085)  (C  Z)^ 


Right  Ear: 


Dist  p3  =  17.0796 


DistR3  ^  Xf  t  i  (ti  b2-.085)  Yf  .  (C-  Z)' 


Dist  R3  =  17.0597 


Incidence  Angle:(theta  is  a/imuth  and  phi  is  elevation) 


A/imuth 

1^3  S3 

EafL  PLS3gf^  (LS3ep)  ep^  (LS3eL)  Cp 

f(PLS3^ 

)  if(PLS3^eL<0.1.  l) 

0  jj  f(PLS3  epyj  acos 

1^3  epgL-e  p\ 

0,3  =  172.9873 'deg 
.  |PLS3^|/  ^ 

Elevation 

A  " 

♦  u  2 

/LSJeu 

acos 

\  iLS3:  , 

1  L3  ~  0.8387  *deg 

Right  Ear:  A/imuth 

RgtS3  S3  Earj^  PRgtS3gpgL  (RgtS3e  p)  e  p  i  (RgiS3ep)eL 

f(PRg‘S3  gpep)  ‘f(PRgtS3gpy^  eL<0,l,  ij 


/P®8t53  gpgL-e  p' 

f(PRg.S3^).«»| 


0R3  =  173.5541 ‘deg 


Elevation 

n  /R8lS3-eu\ 

4>d3  acos 

2  \  lRgtS3|  / 


ij)  R3  =  0.8397  *deg 


One  Reflection  off  Wall  4: 


X 

Lt  (L  Y) 
Z 


Distance: 


Lett  Ear: 


Right  Ear: 


Distil  J(^  X)^  |(n  I  b2-.085)  (2-L  Y)|^(C  zf 

Distil  =  11.9261 

DistR4  J(^  X)^|(^l  b2-.085)  (2  L  Y)|^(C  Z)^ 

Dist  R4  =  12.096 


(ill 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 

Left  Ear:  Azimuth 

LS4  S4  EafL  PLS4gp^  (LS4  ep)  epi  (l^  ejJ  eL 
if(PLS4^-eL<0.1.  1) 

(PL.S4  *€  p\ 

014=  87.8371 -deg 

|PLS4eFeL|  / 

Elevation 

„  /LS4-ey\ 


(>14  =  -  -  acos 
^  2  \  iLS4i 


(>L4  =  1.2011 -deg 


Right  Ear:  Azimuth 

RgtS4  =S4-Earj^  PRgtS4  gp^  -  (RgtS4ep)  e  p+ (RgtS4ep] -Cl 

f(PRgtS4  gpgL)  --  if(PRgtS4  ePeL-e  L<0.  i  •  0 

/PRgtS4  gpAT  'C  p\ 

0R4  0R4=  87.8675 -deg 


\  |*^8*84gpyL 


Elevation 


71  /R«tS4eu\ 

<>R4  -  -  -  acos  — - -  - 

2  \  |RgtS4i  / 


(|)jj4  =  1.1843*deg 


One  Reflection  olT  Floor: 


X' 

Y 

\  Z/ 


Distance: 


Left  Ear: 


Right  Ear: 


Dist  lf  -  X)V  :  (t1  +  b  2-. 085)  -Yf+^+zf 
Dist  jjp  =  1,15^ 

Dist  Rp  - -  X)^  I  (ti  -  b  2- .085)  ~  Y J ^  ( C  +  Z)' 
£)ist  =  2.6329 


61 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 


Left  Ear:  Azimuth 

LSF  Sp  EarL  PLSF^  (LSF  e  p)  e p  ^  (LSF  e  e p 

f(l^FeFeL) 

I  ®  f\ 

0  lf  -  ffPLSFeFeL)  T™  ^  «  LF  =  77.8208  -deg 

\  PESPeFa.;  I 

Elevation 

7t  /LSF-eu\ 

(|)  lf  =  -  -  acos|  1  4>  lf  =  39.3667  -deg 

Right  Ear:  Azimuth 

RgtSF  Sp-  Ear^  PRgtSFgp^  =  (RgtSFep)  ep+  (RgtSFejJ  cL 
f(PRgtSF  if(PRgtSF ^  e  l<0.  1 .  l) 

/PRgtSFgpgL-ep) 

0RP  : :f(PRgtSF^)  acosU^^^^-  0RP  =76.7762 -deg 


Elevation 


71  (RgtSF-eu 
•PRE  “  “  ”  2COS 
2 


RgtSF!  /  <i»  RF  *  .6565  *deg 


One  Reflection  off  Ceiling: 


X 

Y 

H^-(H-  Z) 


Distance: 


Left  Ear: 


Dist  LC  =  ^  -  X)^  h  [  (t1  +  b  2- .085)  -  Y ]  V  ( C  -  ( 2.H  Z) 


DistjLc  =8.5213 


Right  Ear:  r - 

DistRc  X)^-[(ll- b2-.085)  Y  |^  ^(C  (2.H  -  Z))^ 

Dist  =  8.48 13 


G2 


Incidence  Angle:  (theta  is  azimuth  and  phi  is  elevation) 

Left  Ear:  Azimuth 

LSC  Sc  EaxL  ^  (LSCep)  eF‘  (LSCej^-ec 
f(PLSCei^)  if(PLSCeFeLeL<0.1.  l) 


ejjC  -  f(PLSC  epa.)  acosi 


/PLSCgpgj^ep\ 


H.SC 


cFeL 


0LC=  77.8208 ‘deg 


Elevation 


71  /LSC-eu\ 

(t  LC  =  ^  acos^  I  4*  LC  =  75.5038  -deg 


Right  Ear:  Azimuth 

RgtSC  =Sc  Earj^  PRgtSCgp^  (RgtSC  cp^  ep  (RgtSC  ejJ  cL 

f^PRgtSC  gp^)  if^PRgfiC  eFeL®  ^ 

/PRgtSC  *c  p\ 

0RC  ^f(*^g‘SCeI^)  acos  y- ----  OrC  =76.7762-deg 


Elevation 


n  /l^g*SC-eu\ 

9Rr  =--acos - — 

2  \  iRgtSCi  / 


(t>  RC  =  76.5886 ‘deg 


Two  Reflections  off  Floor  and  Walll: 


•Fl 


W.  (W  X)  I 

I 

Y 

Z  I 


Distance: 


Left  Ear: 


Right  Ear: 


Distppi  (2-W  X))^H[(ii.  b2-.085)  Y]^(^  Z)^ 

Dist  Lpi  =  13.3309 

DistRpi  -  (2-W  X))^  [  (n  b2-.085)  y]"  ^  (C  +  zf 

Dist  jjpj  =  13.3053 


63 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 

Left  Ear:  Azimuth 

LSFl  Spi  EafL  PLSFl  (LSFl  ep)  ep  (iSFl-ep)  Cp 

f(PLSFl^)  -if(PLSFl^eL<0.1.  l) 

®LF1 

Elevation 

„  /LSFleu\ 

<l>  LFl  "  2  ■  ji^Fl  i  )  * 

Right  Ear:  Azimuth 

RgtSFl  -  S  PI  -  Ear  R  PRgtSFl  ^  =  (RgtSFl  e  p)  e  p .  (RgtSFl  e  p)  e  p 

f(PRgtSFl  -  if(PRgtSFl  ^  t  L<0. 1 .  1) 

/PRgtSFl  gp^  e  p\ 

0RF1  -f  PRgtSFl  acos  _  OrPI  =8.3482-deg 

|PRgtSFl  gp^l  j 


Elevation 

n  /RgtSFi-eu^ 

“2"*“URg5Fir/  ♦rf1"  ’«78-*8 

Two  Reflections  off  Floor  and  Wall3: 


Distance: 


Left  Ear:  r - 

DistLF3  ^^|<^  ^  X)^  [(ti  +  b  2-.085)  -  Y?  t  (C+  zf 


DistLF3  =  17.1672 


Right  Ear:  I - 

DisfRF3  *’2  -085)  Yj^  (C+zr 

DistRp3  =  17.1474 


(M 


Incidence  Angle:(Uieta  is  azimuth  and  phi  is  elevation) 


Left  Ear:  Azimuth 


LSF3  Sp3  EafL  PLSF3gpeL  (LSF3ep)  ep  i  (LSBei^  ep 
f(PLSF3ePeL)  if(PLSF3  ^  e  l<0.  I .  l) 


0  lf3  f(PLSF3  acos 


PLSF3eFeLeF' 

:PLSF3„k„,; 


e  LP3  =  172.9873*deg 


Elevation 


„  /LSF3-eu\ 

•I*  LF3  2  ^ 


Right  Ear:  Azimuth 


RgtSF3  Sp3  Ear  u  PRgtSFS  gpy^  (RgtSB  cp)  Cp^  ^RgtSB  ep)  e  l 

f(PRgtSF3  epep)  if(PRgtSF3  gj^e  l<0.  1 .  l) 

I PR.gtSF3  p\ 

0  RP3  f(PRgtSF3  gpgL)  acos  ,  i  T  RB  =  173.5541  «deg 


Elevation 

rt  /RgiSB-Cu^ 

RP3  -  acos^  iRg^j  /  (|>Rp3=  5.8576 ‘deg 


Two  Reflections  off  Floor  and  Ceiling: 


Distance: 


:  ^  I 

:H+  (Hi  Z)  ; 


Left  Ear: 


|(^  X)^-j  (t1  i  b2-.085)  Y'^r(C-  (2-H^Z))^ 


Dist  j  py  =  10.4696 


Right  Ear: 


|(^  X)V!  (n- b2-.085)  (2-H.  Z))^ 


=  10.4371 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 
Left  Ear:  Azimuth 


LSFC  Spc  EarLPLSFCep^  (LSFCeF)eF  (LSFCeiJeL 
f(PLSFC^)  if(PLSFC^  eL<0.1.  l) 

/PLSFC-pa  eF\ 

0Lk:  f(PLSFCeRi)  acos|  I  0lfc  =77.8208 -deg 


Elevation 


„  /LSPC-ey) 

♦lFC  2  ““1  LSrci  I 


Right  Ear:  Azimuth 


RgtSFC  S  k:  Ear  RPRgtSFC  gpeL  "  (Rg«SFC  e  f)  e  f  -  (RgtSFC  e  i)  e  l 
f(PRgtSFCei^)  if(PRgtSFCeRieL<0.1.  l) 

/  \  /PRgtSFC  efwt 'C  f\ 

0  RFC  =  f(PRgtSFC  cFcl)  '““I  TpRgfii^^J  I  ®  RFC  =  76.7762  ‘deg 


Elevation 

rc  /RgtSFC.eu\ 

RFC  2  RgfiFC!  /  <|>RFC=79.136*deg 


Two  Reflections  off  Ceiling  and  Floor: 


Distance: 


'CF  =i  Y 

(Ht  Z) 


Left  Ear: 


^s‘LCF  X)^f[(Tl  hb2-.085)  Yl^  (Cf  (2Hr  Z))^ 


Disc  lcf  ~  ^  ^  .942 


Right  Ear: 


tRCF  ,J(^-  X)'ti  (ii-b2-.085)  yJ'-(C+(2H,  Z))' 


Disti,rp  =  11.9135 


60 


Incidence  Angle:(theta  is  azimuth  and  phi  is  elevation) 


Left  Ear:  Azimuth 

LSCF  Sq;  EarLPLSCFgpeL  (LSCFep)ep  (LSCFejjeL 
f(PLSCFeftL)  if(PLSCFePeL  CL<0.1,  l) 

fPLSCFeftL-eFl 


0  lq;  f^PLSCF  gp^)  acosj 


PLSCF 


eF^ 


6 1X3;=  77.8208 -deg 


Elevation 


„  /LSCF-eu\ 

♦lCF  2  ““I  lSCF!  /  *LCF"  79.711 ‘deg 


Right  Ear:  Azimuth 

RgtSCF  SQj  -EarjjPRgtSCFgf^  (RgtSCF  ep)  ep  ^  (RgtSCF-ei)  Cl 
f^PRgtSCF  gpgp)  if^PRgtSCFgp^eL<0,l,  1) 


®RCF  ^(PRgtSCFgpy^jacos 


/PRgtSCF  ej^e  p' 


PRgtSCF 


eFeL 


i  / 


0  jjQp  =  76.7762  *deg 


Elevation 

♦  rcf 


Tt 

acos 

2 


'RgfiCF'Cu^ 
1'  RgtSCF,  / 


RCF  “  80.4958  *deg 


67 


Appendix  B.  Speaker  Loeations  for  HRTF  Measurements 

This  appendix  contains  the  data  for  speaker  location  nsf'd  in  measuring  the 
HRTF's.  The  following  list  provides  the  location  and  ITD  for  each  of  the  272  speakers 
on  the  dome. 


Speaker  Locations  and  Interaural  Time  Delays 
(aingles  in  degrees,  and  time  delays  in  microseconds) 


Speaiker 

Azimuth 

Kiev. 

1  ITD  1 

Speeiker 

1  Azimuth 

1  Elev. 

1  ITD 

1 

10,57 

0 

90  1 

51 

90 

57.25 

320 

2 

31.72 

0 

260  1 

52 

90 

44.71 

450 

3 

44.71 

0 

372  1 

53 

90 

31.72 

573 

4 

57.98 

0 

470  1 

54 

90 

10.57 

690 

5 

69.09 

0 

570  1 

55 

90 

-10.57 

648 

6 

82.41 

0 

658  1 

56 

90 

-31.72 

510 

7 

97.59 

0 

730  1 

57 

90 

-44.71 

428 

8 

111.09 

0 

770  1 

58 

90 

-57.25 

334 

9 

123.02 

0 

480  1 

59 

90 

-69.09 

210 

10 

135.29 

0 

388  1 

60 

90 

-82.41 

66 

11 

148 . 28 

0 

280  1 

61 

270 

-82.41 

112 

12 

169.43 

0 

100  1 

62 

270 

-69.09 

264 

13 

190.57 

0 

85  1 

63 

270 

-57.25 

396 

14 

211.72 

0 

274  1 

64 

270 

-44.71 

460 

15 

224.71 

0 

390  1 

65 

270 

-31.72 

556 

16 

236.98 

0 

480  1 

66 

270 

-10.57 

684 

17 

249.09 

0 

784  1 

67 

270 

10.57 

680 

18 

262.41 

0 

744  1 

68 

270 

31.72 

564 

19 

277.59 

0 

675  1 

69 

270 

44.71 

423 

20 

290.91 

0 

586  1 

70 

270 

57.25 

320 

21 

303.02 

0 

484  1 

71 

270 

69.09 

180 

22 

315.25 

0 

380  1 

72 

270 

82.41 

50 

23 

328.28 

0 

265  1 

73 

20.3 

67.6 

80 

24 

349 . 43 

0 

90  1 

74 

20.5 

52.4 

120 

25 

0 

79.43 

0  1 

75 

16.0 

39.8 

125 

26 

0 

58.28 

0  1 

76 

12.9 

27.1 

114 

27 

0 

45.29 

0  1 

77 

10.6 

14.7 

100 

28 

0 

32.77 

0  1 

78 

55.0 

72.0 

150 

29 

0 

20.91 

0  1 

79 

40.9 

58.4 

282 

30 

0 

7.59 

0  1 

80 

34.9 

44.2 

212 

31 

0 

-7.59 

0  1 

81 

29.3 

32.4 

271 

32 

0 

-20.91 

0  1 

82 

24.9 

20.1 

215 

33 

0 

-32.77 

0  1 

83 

21.1 

7.6 

206 

34 

0 

-45.29 

0  1 

84 

66.3 

60.2 

344 

35 

0 

-58.28 

0  1 

85 

51.2 

47.4 

414 

36 

0 

-79.43 

0  1 

86 

45.0 

35.3 

449 

37 

180 

-79.43 

0  1 

87 

40.1 

24.2 

354 

38 

180 

-58.28 

0  1 

88 

35.8 

12.3 

324 

39 

180 

-45.29 

0  1 

89 

71.7 

47.6 

498 

40 

180 

-32.77 

0  1 

90 

59.5 

36.0 

532 

41 

180 

-20.91 

0  1 

91 

53.9 

24.4 

400 

42 

180 

-7.59 

0  1 

92 

49.1 

12.2 

411 

43 

180 

7.59 

0  1 

93 

74.9 

34.8 

558 

44 

180 

20.91 

0  1 

94 

68.2 

23.3 

648 

45 

180 

32.77 

0  1 

95 

62.4 

11.5 

510 

46 

180 

45.29 

0  1 

96 

81.9 

20.9 

630 

47 

180 

58.28 

0  1 

97 

75.1 

10.2 

620 

48 

180 

79.43 

0  1 

98 

159.7 

67.6 

90 

49 

90 

82.41 

54  1 

99 

159.5 

52.4 

130 

50 

90 

69.09 

186  1 

100 

164.0 

39.8 

130 

Speaker  1  Azimuth  I  Elev.  I  ITD  I  Speaker  I  Azimuth  I  Elev.  I  ITD 


101 

167.1 

27.1 

120 

1 

156 

330.7 

32.4 

210 

102 

169.4 

14.7 

105 

1 

157 

335.1 

20.1 

194 

103 

125.0 

72.0 

160 

I 

158 

338.9 

7.6 

170 

104 

139.1 

58.4 

210 

1 

159 

293.7 

60.2 

235 

105 

145.1 

44.2 

250 

1 

160 

308.8 

47.4 

303 

106 

150.7 

32.4 

245 

1 

161 

315.0 

35.3 

300 

107 

155.1 

20.1 

230 

1 

162 

319.9 

24.2 

303 

108 

158.9 

7.6 

200 

1 

163 

324.2 

12.3 

292 

109 

113.7 

60.2 

280 

1 

164 

288.3 

47.6 

360 

110 

128.8 

47.4 

327  1 

165 

300.5 

36.0 

390 

111 

135.0 

35.3 

350  1 

166 

306.1 

24.4 

404 

112 

139.9 

24.2 

340  1 

167 

310.9 

12.2 

407 

113 

144.2 

12.3 

310  1 

168 

285.1 

34.8 

470 

114 

108.3 

47.6 

400  1 

169 

291.8 

23.3 

510 

115 

120.5 

36.0 

430  1 

170 

297.6 

11.5 

518 

116 

126.1 

24.4 

440  i 

171 

278.1 

20.9 

615 

117 

130.9 

12.2 

430  1 

172 

284.9 

10.2 

628 

118 

105.1 

34.8 

520  1 

173 

10.6 

-14.7 

80 

119 

111.8 

23.3 

543  1 

174 

12.9 

-27.1 

90 

120 

117.6 

11.5 

530  1 

175 

16.0 

-39.8 

80 

121 

98.1 

20.9 

630  1 

176 

20.5 

-52.4 

90 

122 

104.9 

10.2 

730  1 

177 

20.3 

-67.6 

60 

123 

200.3 

67.6 

56  1 

178 

21.1 

-7.6 

172 

124 

200.5 

52.4 

110  1 

179 

24.9 

-20.1 

190 

125 

196.0 

39.8 

106  1 

180 

29.3 

-32.4 

205 

126 

192.9 

27.1 

90  1 

181 

34.9 

-44.2 

180 

127 

190.6 

14.7 

80  1 

182 

40.9 

-58.4 

158 

128 

235.0 

72.0 

124  1 

183 

55.0 

-72.0 

370 

129 

220.9 

58.4 

180  1 

184 

35.8 

-12.3 

280 

130 

214.9 

44.2 

220  1 

185 

40.1 

-24.2 

290 

131 

209.3 

32.4 

220  1 

186 

45.0 

-35.3 

270 

132 

204.9 

20.1 

205  1 

187 

51.2 

-47.4 

250 

133 

201.1 

7.6 

180  1 

188 

66.6 

-60.2 

266 

134 

246.3 

60.2 

250  I 

189 

49.1 

-12.2 

386 

135 

231.2 

47.4 

299  1 

190 

53.9 

-24.2 

394 

136 

225.0 

35.3 

320  1 

191 

59.5 

-36.0 

340 

137 

220.1 

24.2 

323  1 

192 

71.7 

-47.6 

540 

138 

215.8 

12.2 

310  1 

193 

62.4 

-11.5 

490 

139 

251.7 

47.6 

370  1 

194 

68.2 

-23.2 

470 

140 

239.5 

36.0 

431  1 

195 

74.9 

-34.8 

458 

141 

233.9 

24.4 

430  1 

196 

75.1 

-10.2 

590 

142 

229.1 

12.2 

420  1 

197 

81.9 

-20.9 

556 

143 

254.9 

34.8 

493  1 

198 

169.4 

-14.7 

90 

144 

248.2 

23.3 

530  1 

199 

167.1 

-27.1 

92 

145 

242.4 

11.5 

523  1 

200 

164.0 

-39.8 

100 

146 

261.9 

20.9 

610  1 

201 

159.5 

-52.4 

102 

147 

255.1 

10.2 

750  1 

202 

159.7 

-67.6 

60 

148 

339.7 

67.6 

50  1 

203 

158.9 

-7.6 

190 

149 

339.5 

52.4 

100  1 

204 

155.1 

-20.1 

200 

150 

344.0 

39.8 

113  1 

205 

150.7 

-32.4 

200 

151 

347.1 

27.1 

90  1 

206 

145.1 

-44.2 

200 

70 


152 

349.4 

14.7 

80  1 

207 

139.1 

-58.4 

170 

153 

305.0 

72.0 

120  1 

208 

125.0 

-72.0 

300 

154 

319.1 

58.4 

170  1 

209 

144.2 

-12.3 

370 

155 

325.1 

44.2 

210  1 

210 

139.9 

-24.2 

290 

Speaker  1 

1  Azimuth  1 

1  Elev.  1 

1  ITD  1 

Speaker 

1  Azimuth 

1  Elev. 

1  ITD 

211 

135.0 

-35.3 

280  1 

242 

251.7 

-47.6 

379 

212 

128.8 

-47.4 

270  1 

243 

242.4 

-11.5 

490 

213 

113.7 

-60.2 

298  1 

244 

248.2 

-23.2 

470 

214 

130.9 

-12.2 

400  1 

245 

254.9 

-34.8 

470 

215 

126.1 

-24.4 

380  1 

246 

255.1 

-10.2 

790 

216 

120.5 

-36.0 

360  1 

247 

261.9 

-20.9 

576 

217 

108.3 

-47.6 

368  1 

248 

349.4 

-14.7 

90 

218 

117.6 

-11.5 

480  1 

249 

347.1 

-27.1 

100 

219 

111.8 

-23.3 

462  1 

250 

344.0 

-39.8 

95 

220 

105.1 

-34.8 

454  1 

251 

339.5 

-52.4 

100 

221 

104.9 

-10.2 

780  1 

252 

339.7 

-67.6 

72 

222 

98.1 

-20.9 

550  1 

253 

338.9 

-7.6 

180 

223 

190.6 

-14.7 

83  1 

254 

335.1 

-20.1 

196 

224 

192.9 

to 

90  1 

255 

330.7 

-32.4 

208 

225 

196.0 

-39.8 

90  1 

256 

325.1 

-44.2 

190 

226 

200.5 

-52.4 

110  1 

257 

319.1 

-58.4 

170 

227 

200.3 

-67.6 

80  1 

258 

305.0 

-72.0 

320 

228 

201.1 

-7.6 

180  1 

259 

324.2 

-12.3 

296 

229 

204.9 

-20.1 

190  1 

260 

319.9 

-24.2 

306 

230 

209.3 

-32.4 

207  1 

261 

315.0 

-35.3 

288 

231 

214.9 

-44.2 

212  1 

262 

308.8 

-47.4 

270 

232 

220.9 

-58.4 

190  1 

263 

293.7 

-60.2 

418 

233 

235.0 

-72.0 

348  1 

264 

310.9 

-12.2 

400 

234 

215.8 

-12.3 

300  1 

265 

306.1 

-24.4 

390 

235 

220.1 

-24.2 

300  1 

266 

300.5 

o 

CO 

360 

236 

225.0 

-35.3 

290  1 

267 

288.3 

-47.6 

560 

237 

231.2 

-47.4 

286  1 

268 

297.6 

-11.5 

510 

238 

246.6 

-60.2 

307  1 

269 

291.8 

-23.3 

510 

239 

229.1 

-12.2 

402  1 

270 

285.1 

-34.8 

500 

240 

233.9 

-24.4 

390  1 

271 

284.9 

-10.2 

610 

241 

239.5 

-36.0 

372  1 

272 

278.1 

-20.9 

590 

71 


Appendix  C.  Tables  Comparimj  the  HRTF  Angles  to  the  MathCad 

Output 


The  following  seven  tiibles  provide  a  eomparison  ot  the  angli’s  ol  azimuth  and 
elevation  generateil  by  the  MathC’ad  teini)lat<‘  to  the  angles  of  azimuth  and  elevation 
of  the  nearest  speaker  location.  VVhthin  «'ach  table,  the  actual  source'  and  all  virtual 
sources  used  in  the  individual  simulation  are  compared.  Recall  that  the  orientation 
used  to  measure  azimuth  was  different  for  the  AA.MKL  HR  I  F  measure'inenls  (speaker 
locations)  and  the  .MathC'ad  template.  Figure'  1  she)ws  the  orientation  use'd  in  eleter- 
mining  the  azimuth  angle  tor  reaker  locatie)n.  while  figure'  8  shows  the  oriental  ie)n 
used  to  measure  azimuth  in  the  MathCael  template,  ff'he  number  in  parent lu'sis  in 
the  MathCad  Azimuth  column  is  the  azimuth  angle  given  by  the  orientation  use-d  in 
the  HRTF  measurements. 


SOURCK 

SPEAKER 

.M.VniCAD 

AZl.MTTIl 

WiMHIliniil 

DIRECT  PA  I'll 

10.57 

HHHHI 

0 

REF.  OFF  VVALLl 

10.57 

0 

-3(3) 

0 

REF.  OFF  VVALL2 

277.59 

0 

81(276) 

REF.  OFF  VVALL3 

169.13 

0 

- 1 78(  178) 

0 

REF.  OFF  WALL! 

iiiBmgni 

0 

0 

REF.  OFF  FLOOR 

12.9 

-27.1 

-11(11) 

-26 

REF.  OFF  CEIL. 

20.3 

67.6 

-11(11) 

70 

Table  3.  Comparison  of  Mathcacl  Ou),j)ut  to  Spc'aker  Location  list'd  in  the  Siimila- 
tion  for  a  Source  at  -10  Degrt'es  Azinmtli 


SOURCE 

SPF 

AKER 

.M.VrUCAD 

AZIMUTH 

AZIMUTH 

ELIS  VA  LION 

DIRECT  P.ATH 

0 

180(180) 

0 

REF.  OFF  WALLl 

10.57 

0 

0(0) 

0 

REF.  OFF  WALL2 

262.41 

0 

98(262) 

0 

REF.  OFF  WALL3 

169.43 

0 

0 

REF.  OFF  WALLl 

0 

-112(112) 

0 

REF.  OFF  FLOOR 

-20.91 

180(180) 

-20 

REF.  OFF  CEIL. 

180 

58.28 

180(180) 

64.5 

Tal)le  1.  Comparison  of  Mat  head  Output  to  Six'aker  Location  used  in  the  Simula¬ 
tion  hi'-  a  Source  at  ISO  Degrees  Azimuth 


SOURCE 

SPEAKER 

.M.VniCAD 

AZLMUTH 

ELEVAl ION 

ELEVATION 

DIRECT  PATH 

20.5 

52.4 

51 

0 

20.91 

-5(5) 

20 

REF.  OFF  WALL2 

270 

10.57 

7 

REF.  OFF  WALL3 

169.1 

14.7 

-177(177) 

11 

REF.  OFF  WALL! 

68.2 

23.3 

-71(71) 

23 

REF.  OFF  FLOOR 

20.3 

-67.6 

-18(18) 

-60 

REF.  OFF  CEIL. 

20.5 

52.1 

-18(18) 

55 

Table  5.  Comparison  of  Mathcad  Output  to  Siieaker  Location  used  in  the  Simula¬ 
tion  for  a  Source  at  20  Degrees  Azimuth  oO  Degrees  Klevation 


(1 


SOlIRCK 

SPEAKER 

M.VniCAD 

AZLMCTH 

FLlAAd'lON 

■VilHIIIIMil 

DIRECT  PAITI 

212.1 

1 1 .5 

■IWMBIW 

6 

REE.  OEE  WALIA 

Hl&SSH 

0 

1 

REF.  OFF  WALL2 

262.-1 1 

0 

hsbbh 

.5 

REF.  OFF  \VALL;3 

0 

1 

REF.  OFF  WALL  l 

0 

I 

REF.  OFF  FLOOR 

-36 

-37 

REF.  OFF  CEIL. 

2;]5.0 

72 

117(213) 

74 

Table  6.  Comparison  of  Maihead  Output  to  Speaker  Location  used  iu  tlu'  Simula¬ 
tion  for  a  Source  at  120  Degrees  Azimuth 


SOURCE 

SPEAKER 

MAITICAD 

AZIMUTH 

ELEVA'nON 

AZIMUTH 

ELEVALION 

DIRECT  P.ATH 

126.1 

-24.4 

-17 

REF.  OFF  WALLl 

10.57 

0 

-8(8) 

-3 

REF.  OFF  WALL2 

277.59 

0 

92(268) 

-1 

REF.  OFF  WALL3 

169.43 

0 

-173(173) 

-2.5 

REF.  OFF  WALL4 

104.9 

-10.2 

-97(97) 

-5 

REF.  OFF  FLOOR 

120.5 

-36.0 

-19 

REF.  OFF  CEIL. 

125.0 

72.0 

-117(117) 

76 

Table  7.  (.'omparison  of  Alathcad  Output  to  Speaker  liOcatioii  used  in  the  Simula¬ 
tion  for  a  Source  at  -120  Degrees  Azimuth 


SOURCE 

SPEAKER 

MATHCAD 

AZIMUTH 

ELIW'AI'ION 

AZIMUTH 

ELEVWriON 

DIRECT  PATH 

75. 1 

10.2 

-77(77) 

7 

REF.  OFF  WALLl 

10..57 

0 

-9(9) 

1 

REF.  OFF  WALL2 

277.-59 

0 

89(271) 

.5 

REF.  OFF  WALL3 

0 

■Ban™ 

1 

REF.  OFF  WALL4 

0 

2 

REF.  OFF  FLOOR 

71.9 

-34.8 

-77(77) 

-10 

REF.  OFF  CEIL. 

90 

82.41 

76 

Table  8.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Simula¬ 
tion  for  a  Source  at  -75  Degn'es  Azimuth 


71 


SOURCE 

SPEAKER 

MATHCAD 

AZIMUTH 

ELEVVVnON 

AZIMUTH 

ELEV.VnON 

DIRECT  PATH 

284.9 

10.2 

77(283) 

7 

REF.  OFF  WALLl 

349.43 

0 

9(351) 

1 

REF.  OFF  WALL2 

277.59 

0 

89(271) 

1 

REF.  OFF  WALL3 

190.57 

0 

173(187) 

1 

REF.  OFF  WALL4 

82.41 

0 

-88(88) 

1 

REF.  OFF  FLOOR 

285.1 

-34.8 

77(283) 

-40 

REF.  OFF  CEIL. 

293.7 

G0.2 

77(283) 

76 

FLOOR,  WALLl 

338.9 

-7.(5 

9(351) 

-8 

FLOOR,  WALL3 

180 

-7.59 

173(187) 

-6 

FLOOR,  CEILING 

82.41 

77(283) 

79 

CEILING,  FLOOR 

-82.41 

77(283) 

-80 

Table  9.  Comparison  of  Mathcad  Output  to  Speaker  Location  used  in  the  Simula¬ 
tion  for  a  Source  at  75  Degrees  Azimuth 


Appendix  D.  ESPS  Manual  Pages  and  C  Code 


This  appendix  contains  the  manual  pages  for  each  of  the  commands  used  in 
developing  the  binaural  signals  produced  in  this  thesis.  1  he  (’  Code  for  the  delay 
command  is  also  included. 


D.  I  Addsd 

Manual  Pages  for  the  ESPS  Command  "addsd": 

NAME 

addsd  -  add  ESPS  sampled  data  files  with  optional  scaling 
multsd  -  multiply  ESPS  sampled  data  files  with  optional 
scaling 
SYNOPSIS 

addsd  [  -X  debug_level  ]  [  -r  range  ]  [  -p  range  ]  [  -g 
scale  ]  [  -2  ]  [  -t  ]  filel  file2  fileS 
multsd  [  -X  debug_level  ]  [  -r  range  ]  [  -p  range  ]  [  -g 
scale  ]  C  -2  ]  C  -t  ]  filel  file2  file3 
DESCRIPTION 

Addsd  (l-ESPS)  and  multsd  (1-ESPS)  are  the  same  binary  file. 
The  function  the  program  does  (either  adding  or  multiplying) 
depends  on  the  name  that  is  used  to  call  it.  The  options  eind 
syntax  are  the  same  for  both  adding  and  multiplying.  Because 
the  calling  naime  is  used  in  the  program  logic,  both  addsd 
(1-ESPS)  and  multsd  (1-ESPS)  cannot  be  linked  or  copied  to 
new  names.  Below  only  addsd  (1-ESPS)  is  described,  but  the 
description  of  multsd  (1-ESPS)  is  completely  einalogous. 

Addsd  takes  sampled  data  from  filel,  adds  it  sample-by¬ 
sample  to  the  saimpled  data  in  file2,  possibly  scaling  the 
data  in  file2  first,  and  outputs  the  results  as  an  ESPS 
FEA_SD  file  file3.  If  there  are  not  enough  records  in  file2, 
and  if  the  -t  option  is  not  used,  addsd  reuses  file2, 
starting  with  its  first  record. 

Both  filel  and  file2  must  be  ESPS  FEA_SD  files,  and  they 
must  have  the  same  sampling  frequency  (record.f req) ; 
otherwise  addsd  exits  with  an  error  message.  The  output  file 


7f> 


data  type  is  selected  to  "cover"  the  two  input  data  types. 
That  is,  all  values  of  the  input  types  can  be  stored  in  the 
output  type.  For  example,  if  one  file  is  type  SHORT.CPLX  aind 
the  other  is  type  FLOAT,  the  output  type  is  FLOAT_CPLX.  If 
"-"  is  supplied  in  place  of  filel,  then  standard  input  is 
used.  If  "-"  is  supplied  in  place  of  file3,  standard  output 
is  used. 

OPTIONS 

The  following  options  are  supported: 

-X  debug_level 

If  debug_level  is  positive,  addsd  prints  debugging 
messages  and  other  information  on  the  standard  error 
output.  The  messages  proliferate  as  the  debug_level 
increases.  If  debug_level  is  0,  no  messages  are 
printed.  The  default  is  0.  Levels  up  through  2  are 
supported  currently. 

-p  range 

Selects  a  subrange  of  records  from  filel  using  the 
format  start-end  or  start  rend.  Either  start  or  end  may 
be  omitted,  in  which  case  the  omitted  parameter 
defaults  respectively  to  the  start  or  end  of  filel.  The 
first  record  in  filel  is  considered  to  be  frame  1, 
regardless  of  its  position  relative  to  any  original 
source  file.  The  default  range  is  the  entire  input 
file  filel.  The  selected  subrenge  from  filel  is  then 
added  to  the  (possibly  scaled)  data  from  the  file2, 
starting  with  the  first  record  of  file2.  If  the 
subrange  does  not  exist  in  filel,  addsd  exits  with  an 

-r  range 

-r  is  a  synonym  for  -p. 

-g  scale 

Causes  addsd  to  multiply  the  data  in  file2  by  scale 
before  adding  it  to  the  data  in  filel.  The  format  for 
scale  is  either  integer  or  floating  point. 

-z  Supresses  warning  messages  that  normally  are  generated 
if  the  contents  of  file2  are  used  more  than  once. 

-t  Truncates  the  processing  if  there  are  not  enough 


records  in  file2.  In  this  case,  file3  will  contain  as 
many  records  as  there  are  in  file2  or,  if  the  -p  option 
is  used,  as  many  records  as  in  the  intersection  of 
reinge  and  the  full  range  of  file2. 

ESPS  PARAMETERS 

The  ESPS  parameter  file  is  not  read  by  addsd. 

ESPS  COMMON 

If  Common  processing  is  enabled,  the  following  items  are 
read  from  the  ESPS  Common  File  provided  that  filel  is  not 
standard  input,  and  provided  that  the  Common  item  filename 
matches  the  input  file  name  filel: 

start  -  integer 

This  is  the  starting  point  in  filel. 
nan  -  integer 

This  is  the  number  of  points  to  add  from  filel. 

If  start  and/or  nan  are  not  given  in  the  common  file, 
or  if  the  common  file  can’t  be  opened  for  reading,  then 
start  defaults  to  the  beginning  of  the  file  amd  nan 
defaults  to  the  number  of  points  in  the  file.  In  all 
cases,  values  of  start  and  nan  are  ignored  if  the  -p  is 
used. 

If  Common  processing  is  enabled,  the  following  items  are 
written  into  the  ESPS  Common  file  provided  that  the  output 
file  is  not  <stdout>: 

start  -  integer 

The  starting  point  (1)  in  the  output  file  fileS. 
nan  -  integer 

The  number  of  points  in  the  output  file  file3 
prog  -  string 

This  is  the  name  of  the  program  (addsd  in  this  case)  . 


filenaune  -  string 


The  nzime  of  the  output  file  file3. 

ESPS  Common  processing  may  be  disabled  by  setting  the 
environment  variable  USE_ESPS_C0MM0N  to  off.  The  default 
ESPS  Common  file  is  .espscom  in  the  user's  home  directory. 
This  may  be  overidden  by  setting  the  environment  variable 
ESPSCOM  to  the  desired  path.  User  feedback  of  Common 
processing  is  determined  by  the  environment  variable 
ESPS_VERBOSE,  with  0  causing  no  feedback  and  increasing 
levels  causing  increasingly  detailed  feedback.  If 
ESPS_VERBQSE  is  not  defined,  a  default  value  of  3  is 
assumed . 

ESPS  HEADERS 

The  following  items  are  copied  from  the  header  of  filel  to 
the  header  of  file3; 

variable . comment 
variable. refer 
record_f req 


If  the  -g  option  is  used,  a  generic  header  item  scale  is 
added  to  the  output  file  header  that  contains  the  -g 
...pecified  value.  Max_value  in  file3  is  not  set. 

The  generic  header  item  start_time  is  written  in  the  output 
file.  The  value  written  is  computed  by  taking  the  start .time 
value  from  the  header  of  the  first  input  file  (or  zero,  if 
such  a  header  item  doesn't  exist)  and  adding  to  it  the 
relative  time  from  the  first  record  in  the  file  to  the  first 
record  processed. 

SEE  ALSO 

ESPS  (5-ESPS),  FEA.SD  (5-ESPS) ,  record  (1-ESPS),  range 
(1-ESPS),  copysd  (1-ESPS) 

WARNINGS 

If  there  are  not  enough  records  in  file2  -  i.e.,  if  addsd 
has  to  start  over  at  the  beginning  of  file2,  -  a  warning 
message  is  printed.  This  warning  is  inhibited  by  the  -z 
option,  and  does  not  apply  if  -t  is  used. 


BUGS 


None  known. 

AUTHOR 

Ajaipal  S.  Virdy. 

Modified  for  ESPS  3.0  by  David  Burton  and  John  Shore. 
Modified  for  FEA_SD  by  David  Burton. 

Modified  to  support  multiplication  by  David  Burton. 


D.2  Dday 

The  following  is  the  C  code  written  by  Capt .  John  Columbi  for  the  "delay" 
command  used  during  this  thesis  work. 


/ *****:*(**  If  ********  ********i4‘***********************!4‘**4i\ 
* 

\******** ******************************************* ********************/ 

#include  <stdio.h> 

/*  ESPS  includes  */ 
tfinclude  "esps/esps .h" 

#include  "esps/f easd .h" 

#include  "esps/fea.h" 

#include  "esps/unix.h" 

#define  MYROUND(a)  (  a  -  (int)a  >=.5  ?  ((int)a+l)  :  ((int)a)  ) 


#define  SYNTAX  USAGE("delay  [-x  debug_level]  [-h  help]  -d  millisec 
-s  scale  -j  jog#samples  infile  outfile"  ); 


char 

long 


*get_cmd_line() ; 
debug_level  =  0; 


extern  optind; 

extern  char  *optarg; 


mainCargc,  argv) 


SO 


short  argc ; 

char  **argv: 

{ 

struct  SPECTRUM  *spectrum,  *spec_temp; 

FILE  *in_fptr  =  stdin,  *out_fptr  =  stdout ; 

struct  header  *ih,  *oh; 

struct  feasd  *in_feasd_rec,*temp_rec; 

struct  feasd  *out_f easd_rec; 

short  *in_data,  *delay_wave; 

float  record.freq  =  40000.; 

float  **spec_data; 

int  num_in_pts,  num_fraines; 

int  jogsample  =  0; 

short  c,  i,  num.of _f lies; 

char  *in_fname  =  "stdin",  *out_fname  =  "stdout"; 

char  *ProgName  =  "delay"; 

char  *Version  =  "1.3"; 

char  *Date  =  "Today"; 

char  *cmd_line; 

float  delay  =  0.0; 

float  gain  -  1.0; 

/*  Check  the  cominand  line  options.  ♦/ 

cmd.line  =  get _cmd_ line (argc,  argv) ;  /*  store  copy  of  command  line  */ 
while  ((c  =  getopt(argc,  argv,  "x:d: j :s:h:"))  !=  EOF)  { 
switch  (c)  { 

case  'x’ : 

debug_level  =  atoi(optarg) ; 

break; 

case  'd' : 

delay  =  atof (optarg) ; 

fprintf  (stderr ,  "delay  signal  by  */,f  millisecsN  .n"  , delay) ; 

break; 

case  ’s’: 

gain  =  atof (optarg) ; 

fprintf (stderr,  "scale  signal  by  Xf .\n",gain) ; 

break; 
case  ’ j ’ : 

jogsample  =  atoi (optarg) ; 


■SI 


fprintf  (stderr,  "jog  signal  by  '/.i  .  \n" ,  jogsaraple) ; 

break; 

case  'h' : 

printhelpO  ; 

breaJs; 

default : 

SYNTAX; 

break; 

} 

> 


/*  Get  the  filenames  */ 

if  ((num_of_f iles  =  argc  -  optind)  >  2  j|  num_of_files  ==  0)  { 
fprintf  (stderr,  "Improper  filename  specif  icationAn") ; 

SYNTAX ; 

} 

/*  open  the  files,  reading  the  headers  of  the  input  files.  */ 
if  (optind  <  argc)  { 

in.fname  =  eopen(ProgName,  argv[optind++] ,  "r",  FT.FEA,  FEA_SD, 

&ih,  &in_f ptr) ; 

} 

out_fname  =  eopen(ProgName,  argv[optind] ,  "w",  NONE,  NONE, 

(struct  header  **)  NULL,  &out_fptr); 
if  (debug_level)  { 

fprintf  (stderr,  "input  filename  =  y,s  ",  in.fneime) ; 
fprintf (stderr ,  "output  filename  =  %s  ",  out  fname) ; 

} 

nura_in_pts  =  ih->common .ndrec; 

in_feasd_rec  =  allo_feasd_recs(ih,  SHORT,  num_in_pts,  (char  *)  NULL,  NO); 
temp_rec  =  allo_f easd_recs(ih,  SHORT,  num_in_pts,  (char  *)  NULL,  NO); 

get_feasd_recs(in_feasd_rec,  0,  num_in_pts,  ih,  in_fptr) ; 
in_data  =  (short  *)  in_f easd_rec->data; 
delay_wave  =  (short  *)  temp_rec->data; 

delay_wavef orm(in_data,  delay_wave,  num_in_pts, delay,  gain , j ogsample) ; 

/* —  set  up  the  output  file.  — ♦/ 
oh  =  copy_header(ih) ; 


add_source_f ile(oh,  in_fname,  ih) ; 

(void)  strcpy(oh->common.prog,  ProgNeime) ; 

(void)  strcpy(oh->coinmon. vers.  Version); 

(void)  strcpy(oh->common.progdate.  Date); 

write_header(oh,  out_fptr); 

in_f easd_rec->data  =  (short  *)delay_wave; 

put_feasd_recs(in_f easd_rec,  0,  num_in_pts ,oh,  out_fptr) ; 


fclose(in_fptr) ; 
fclose(out_fptr) ; 

if  (debug_level) 

fprintf (stderr ,  "Program  Completed. \n") ; 


} 

/* - i^i 

delay_waveform (waveform,  delayed,  num_pt s, delay , gain, jogsample) 
short  *wavef orm, *delayed; 

int  num_pts; 

float  delay, gain; 

int  pt,cpptr,  samples  *  0; 

double  mecm=0; 

saunples  =  ROUND (delay*40000 .0/1000.0) ; 


if (jogsample)  samples  =  jogsample; 
if  (debug_level) 

fprintf  (stderr ,  "Delaying  by  '/,d  samples\n" ,  samples); 

for  (cpptr=samples,pt  =  0;  pt  <  num_pt s- samples ;  pt++,cpptr++)  { 


delayed [cpptr]  =  wavef ormCpt] *gain; 

} 

for  (cpptr=0;  cpptr  <  samples;  cpptr++)  { 
delayed [cpptr]  =  wavef orm[0] *gain; 

} 

} 

/* - */ 

printhelpO 

{ 


fprintf (stderr,"\nUSAGE:  delay  [-x  debug_level]  [-h  help]  -d  millisec 
-s  scale  -j  jog#saunples  infile  outfile\n"  ); 
fprintf (stderr,"\nDelay  an  ESPS  sampled  data  f ile\n\n") ; 
fprintf (stderr, "Flags  include\n") ; 

fprintf (stderr,"\t  -d  delay  :  delay  by  so  many  millisecs,  fills 
with  first  saunple  \n\t\t  (since  often  background  signal  is  not 
all  0's\n") : 

fprintf (stderr, "\t  -s  scale  :  scale  the  saunples  by  this  float 
value\n") ; 

fprintf (stderr, "\t  -j  samples  :  instead  of  delay  in  millisecs,  delay 
by  jogging  this  number  of  samples\n"); 

} 


D.3  Filter 


Manual  Pages  for  the  ESPS  Command  "filter" ; 


NAME 

filter  -  Performs  digital  filtering  on  a  sampled  data  file. 
SYNOPSIS 

filter  [  -P  param_file  ]  [  -p  range  3  [  -r  range  ]  [  -d 
data.type  ]  [  -f  filtername  ]  [  -F  filt.file  ]  [  -x 
debug.level  ]  [  -i  up/down  ]  [  -z  ]  in_file  out_file 

DESCRIPTION 

The  program  filter  takes  the  input  sampled  data  f^le, 
in.file,  and  produces  an  output  saunpled  data  file,  out.file, 
after  performing  a  digital  filtering  operation  on  it.  The 
output  sampled  data  file  is  of  type  FEA_SD  (5-ESPS) .  Filter 
allows  the  user  to  change  the  data  type  of  the  output  file 
by  using  the  -d  option;  see  below  for  more  details.  The 
prograim  accepts  "-"  for  either  the  input  file  or  the  output 
file  to  use  the  standard  input  and  standard  output, 
respectively . 


The  prograim  may  implement  either  finite  impulse  response 
(FIR)  or  infinite  impulse  response  (HR)  filters.  A  set  of 


numerator  coefficients  and  (optionally)  a  set  of  denominator 
coefficients  are  specified  either  in  the  parameter  file  or 
in  the  FEAFILT  file.  Currently,  only  real  filters  may  be 
used;  if  a  filter  is  complex,  it’s  imaginary  part  is 
ignored.  The  numerator  coefficients  then  become  the  {a  } 
aind  the  denominator  coefficients  become  the  {b  }  in  thei 
following  2-domain  transfer  function:  i 


a  +  a  z-1  +  a  z-2  +  ...  +  a  z-m+1 
H(z)  =  _ 0 _ 1 _ 2 _ m-1 

b  +  b  z-1  +  b  z-2  +  ...  +  b  z-n+1 
012  n-1 


An  FIR  filter  corresponds  to  the  case  where,  in  the  equation 
above,  b  =  n  =  1.  An  FIR  filter  may  also  be  specified  by 
choosingOthe  order  of  the  denominator  to  be  zero,  and 
entering  no  denominator  coefficients  at  all . 

The  program  uses  a  different  initialization  procedure  for 
FIR  filters  than  it  uses  for  HR  filters.  For  FIR  filtering, 
the  first  output  will  be  computed  from  data  samples 
occurring  before  the  starting  point  in  the  input  file  (as 
defined  by  the  parameter  start  ,  for  exaunple)  ,  if  they 
exist.  Data  samples  which  would  occur  before  the  first 
sample  in  the  input  file  are  assumed  to  be  zero.  For  HR 
filtering,  all  inputs  and  outputs  occurring  before  the 
starting  point  are  assumed  to  be  zero. 

OPTIONS 

The  following  options  are  supported: 

-P  param_file 

uses  the  parameter  file  param_fil  rather  than  the 
default,  which  is  params. 

-r  range 

Perform  the  filtering  operation  on  the  specified  range 
of  points,  range  is  a  character  string  which  is 


interpreted  in  the  format  understood  by  range_switch 
(3-ESPSu) .  r  and  p  are  synonyms. 

-p  range 

Perform  the  filtering  operation  on  the  specified  range 
of  points,  range  is  a  character  string  which  is 
interpreted  in  the  format  understood  by  range_switch 
(3-ESPSu) .  -r  is  a  synonym  for  -p. 

-d  data_type 

The  argument  data_type  is  a  character  representing  the 
desired  output  data  type  in  out_file:  b  for  byte,  s  for 
short,  1  for  long,  f  for  float,  and  d  for  double.  This 
data  type  conversion  is  often  useful  when  the  input 
data  type  is  short  aoid  the  filtering  operation  produces 
sample  values  greater  in  magnitude  them  215  .  The 
output  type  is  real  or  complex  in  agreement  with  the 
input  type;  for  example  if  d  is  specified,  the  output 
type  is  DOUBLE  if  the  input  is  real  and  DOUBLE_CPLX  if 
the  input  is  complex. 

-f  filtername 

If  the  coefficients  are  being  read  from  the  parameter 
file,  then  filterneirae  is  the  body  of  the  neune  of  the 
variable  that  contains  the  number  of  coefficients  and 
the  actual  coefficients.  This  means  that  the 
coefficients  will  be  found  in  the  arrays  f iltername_num 
and  f iltern2Lme_den  auid  the  size  of  those  arrays  will  be 
specified  by  f ilternaime.nsiz  and  f iltername_dsiz , 
respectively.  The  default  name  in  this  case  is  filter. 

If  the  coefficients  are  being  read  from  a  FEAFILT  file, 
then  filtername  is  the  number  of  the  filter  record  to 
use.  The  default  name  in  this  case  is  1,  the  first 
record  in  the  file. 

-F  filt_file 

Read  the  coefficients  from  the  FEAFILT  file  filt_file 
rather  than  from  the  parameter  file.  In  this  case  the 
header  of  filt_file  is  added  to  the  header  of  the 
program  output  as  a  source  file. 


-X  debug_level 

A  value  of  zero  (the  default  value)  will  cause  filter 
to  do  its  work  silently,  unless  there  is  an  error.  A 
nonzero  value  will  cause  various  parameters  to  be 
printed  out  during  program  initialization. 

-i  up/down 

Perform  interpolation  filtering  such  that  the  output 
sampling  rate  is  equal  to  (src_sf)* (up/down) .  Both  up 
and  down  are  integers  less  than  10.  Effectively,  the 
program  increases  the  sampling  rate  to  up*(src_sf), 
filters  this  signal  with  the  specified  filter,  and  then 
downsamples  the  resulting  signal  by  a  factor  of  down. 

-z  By  specifying  -z,  the  start _time  generic  value  is 
reduced  by  the  value  of  the  delay_samples  generic 
header  value,  if  it  exists.  This  often  helps  time  align 
a  filtered  signal  with  the  input  signal.  Note  that  if 
delay_samples  is  not  defined  in  the  input  file  header, 
a  value  of  0  is  assumed. 

ESPS  PARAMETERS 

The  values  of  parameters  obtained  from  the  parameter  file 
are  printed  if  the  environment  variable  ESPS_VERB0SE  is  3  or 
greater.  The  default  value  is  3. 

The  following  parameters  are  read  from  the  parameter  file: 
start  -  integer 

The  first  point  in  the  input  saunpled  data  file  that  is 
processed.  The  samples  are  assumed  to  be  numbered 
starting  with  one  so  that  setting  start  =  1  will  cause 
processing  to  begin  with  the  first  seimple. 

nan  -  integer 

The  number  of  points  in  the  sampled  data  file  to 
process . 

f iltername_nsiz  -  integer 


The  number  of  numerator  coefficients  in  the  transfer 
function  for  the  filter  filternaime. 

f iltername_dsiz  -  integer 

The  number  of  denominator  coefficients  in  the  transfer 
function  for  the  filter  filterncime.  A  value  of  zero 
means  that  a  denominator  coefficient  array  need  not  be 
entered. 


f iltername_num  -  float  array 


The  numerator  coefficients.  They  are  specified  in 
order  starting  with  a  . 

0 

f iltername_den  -  float  array 


The  denominator  coefficients.  They  are  specified  in 
order  starting  with  b  . 

0 


ESPS  COMMON 

If  the  input  is  standard  input,  f'OMMON  is  not  read.  The 
following  items  may  be  read  from  COMMON: 


filename  -  string 

This  is  the  name  of  the  input  file.  If  the  commeind  line 
specifies  only  one  fileneune,  it  is  assumed  to  be  the 
output  filename  and  COMMON  is  read  to  get  the  input 
fileneime.  If  the  input  filename  is  specified  on  the 
command  line,  it  must  match  filename  in  COMMON  or  the 
other  items  (below)  are  not  read. 


start  -  integer 

This  is  the  starting  point  in  the  input  file, 
nan  -  integer 


This  is  the  number  of  points  to  process. 


If  the  output  is  standard  output,  COMMON  is  not  written. 
Otherwise  the  following  items  are  written  to  COMMON. 

filename  -  string 

This  is  the  name  of  the  output  file, 
start  -  integer 

This  is  the  starting  point  in  the  output  file  and  is 
always  equal  to  one. 

nan  -  integer 

This  is  the  number  of  points  in  the  output  file. 

ESPS  Common  processing  may  be  disabled  by  setting  the 
environment  variable  USE_ESPS_C0MM0N  to  "off".  The  default 
ESPS  Common  file  is  .espscom  in  the  user's  home  directory. 
This  may  be  overidden  by  setting  the  environment  variable 
ESPSCOM  to  the  desired  path.  User  feedback  of  Common 
processing  is  determined  by  the  environment  variable 
ESPS_VERBOSE,  with  0  causing  no  feedback  and  increasing 
levels  causing  increasingly  detailed  feedback.  If 
ESPS_VERBOSE  is  not  defined,  a  default  value  of  3  is 
assumed. 

ESPS  HEADER 

The  file  header  of  out_file  will  contain  mostly  the  scime 
information  as  is  contained  in  that  of  in_file,  except  where 
they  are  altered  by  the  parameters  in  the  parameter  file. 

The  -i  option  changes  the  sf  header  item.  The  filter 
coefficients  will  be  stored  in  the  output  header  as  the 
filter  zfunc, 

A  generic  header  item  start_time  (type  DOUBLE) ,  is  added  to 
the  output  file  header.  It  contains  the  starting  time  in 
seconds  of  the  first  point  in  the  output  file.  This  start 
time  is  relative  to  the  original  sampled  data  file.  This 
means  that  if  the  input  file  has  a  start _time  generic  in  it, 
the  output  file's  start _time  value  is  computed  relative  to 
the  input  file's  start _time.  Also  see  the  -z  option. 


For  example,  if  the  input  file  has  a  start_time  =  1.0 
seconds,  the  input  file's  saimpling  frequency  =  8000 
saunples/second,  and  the  starting  point  in  the  input  file  = 
2000,  the  output  file's  start_time  =  1.0  +  2000/8000  =  1.25 
seconds . 

SEE  ALSO 

notch_filt(l-ESPS) ,  FEAFILT(5-ESPS) ,  atof ilt(l-ESPS) , 
wmse_filt(l-ESPS) ,  iir_f ilt(l-ESPS) 

None  known. 

AUTHOR 

Brian  Sublett ;  ESPS  3.0  modifications  by  David  Burton. 
FEA_SD  modifications  by  David  Burton;  multichannel  and 
complex  FEA_SD  modifications  by  Bill  Byrne. 


D.4  Mnx 

Manual  Pages  for  the  ESPS  Command  "mux" : 

NAME 

mux  -  multiplex  sampled-data  files  into  a  single 
multichannel  or  complex  file 
SYNOPSIS 

mux  [  -{prs}  range  ]  .  .  .  C  -x  debug_level  ]  [  -J  ]  [  -P 
param_file  ]  input l.fsd  [  input2.fsd  .  .  .  ]  output. fsd 
DESCRIPTION 

The  mux  (''multiplex'')  progreun  combines  its  input  saunpled- 
data  (FEA_SD)  files,  or  equal-length  portions  of  them,  into 
a  single  multichcinnel  output  sampled-data  file,  possibly 
also  combining  real  channels  in  pairs  to  form  complex 
chauinels . 

Normally  the  number  of  output  chauinels  is  the  total  number 
of  channels  in  all  the  input  files.  Each  output  record 
contains  the  data  from  the  samples  field  in  one  record  of 
each  input  file.  This  is  orgeuiized  as  a  single  vector 
containing  one  sample  value  from  each  channel  of  each  input 
file.  Within  the  vector,  the  data  from  the  first  input  file 
comes  first,  followed  by  the  data  from  the  second,  and  so 


90 


on,  in  the  order  of  the  file  naunes  on  the  commaind  line.  The 
channels  of  any  one  input  file  keep  the  same  relative 
ordering  in  the  output  file  that  they  had  in  the  input  file. 

When  the  -J  option  is  used,  the  number  of  output  channels  is 
only  half  the  normal  number;  input  channels  of  a  real  data 
type  are  combined  in  pairs  into  single  output  channels  of 
the  corresponding  complex  type. 

The  input  files  must  be  consistent  in  data  type  and  sampling 
frequency;  the  output  file  has  the  same  data  type  (unless  -J 
is  used)  and  the  same  sampling  frequency  as  the  input  files. 
Any  fields  other  than  samples  in  the  input  are  ignored. 

By  default,  the  first  output  record  contains  data  from  the 
first  record  in  each  input  file,  and  in  general  the  nth 
output  record  contains  data  from  the  nth  record  in  each 
input  file;  but  a  later  starting  point  in  each  input  file 
can  be  chosen  with  the  -p,  -r,  or  -s  option.  By  default, 
records  are  processed  until  am  input  file  runs  out  of  data, 
but  a  shorter  range  of  data  cam  be  chosen  with  -p,  -r,  or 
-s . 

If  is  written  for  an  input  file,  the  steuidard  input  is 

used.  At  most  one  input  file  may  be  standard  input.  Names 
of  disk  files,  however,  may  be  repeated  (duplicating 
channels).  Since  different  -p,  -r,  and  -s  options  may  apply 
to  each  instance  of  a  repeated  input  file  naune,  it  is 
possible  to  align  and  juxtapose  different  portions  of  a 
single  input  file.  If  is  written  for  the  output,  the 

standard  output  is  used. 

OPTIONS 

The  following  options  are  supported: 

-p  range 

For  this  program  -p  and  -r  are  synonymous.  See  -r  for 
the  interpretation  and  the  format  of  the  argument. 

-r  first: last 

-r  first :+incr 

Determines  the  range  of  points  (records)  to  be  tadcen 


from  an  input  file.  In  the  first  form,  a  pair  of 
unsigned  integers  gives  the  numbers  of  the  first  and 
last  records  of  the  rauige.  (Counting  starts  with  1  for 
the  first  record  in  the  file.)  If  first  is  omitted,  1 
IS  used.  If  last  is  omitted,  the  last  record  in  the 
file  is  used.  The  second  form  is  equivalent  to  the 
first  with  last  =  first  +  incr  . 

This  option  aind  the  -p  amd  -s  options  may  be  repeated 
up  to  a  maiximum  total  number,  for  all  three  kinds,  of 
the  number  of  input  files.  The  first  -p,  -r,  or  -s 
option  applies  to  the  first  input  file,  the  second  to 
the  second,  and  so  on.  If  there  are  fewer  -p,  -r,  aind 
-s  options  thaoi  input  files,  the  last  such  option 
applies  to  all  the  remaining  input  files.  In 
particular,  if  there  is  only  one  -p,  -r,  or  -s  option, 
it  applies  to  all  the  input  files. 

If  two  options  disagree  as  to  the  number  of  records  to 
be  processed,  the  smaller  number  applies.  In  fact  mux 
stops  processing  as  soon  as  it  encounters  either  the 
end  of  a  specified  remge  or  the  actual  end  of  file  in 
any  input  file.  Certain  inconsistencies  in  these 
various  stopping  criteria  will  evoke  warning  messages; 
see  the  Diagnostics  section  for  details. 

s  start  rend 

s  start :+incr 

Determines  the  range  in  seconds  of  the  data  to  be  tedcen 
from  an  input  file.  In  the  first  form,  a  pair  of 
floating-point  numbers  give  the  beginning  time  and 
ending  time  of  the  range.  The  second  form  is 
equivalent  to  the  first  with  last  =  first  +  incr.  Each 
sample  has  a  time  given  by  s  +  (r-l)/f,  where  s  is  the 
value  of  the  generic  header  item  "start _time" ,  r  is  the 
record  number,  and  f  is  the  sampling  frequency,  given 
by  the  generic  header  item  "record.freq" .  This  time 
may  depend  on  the  channel  number,  since  the 
" start _time"  item  may  be  a  vector  with  a  component  per 
channel;  for  present  purposes  the  value  for  the  first 
chzoinel  (number  0)  is  used.  The  range  selected  by  the 


-s  option  consists  of  the  records  for  which  the  time  is 
less  than  end  but  not  less  them  start . 

This  option  and  the  -p  and  -r  options  may  be  repeated 
to  supply  different  reinges  for  different  input  files. 
See  the  -r  option  for  details. 

-X  debug_level 

If  debug_level  is  positive,  the  prograjn  prints 
debugging  messages  as  it  progresses — the  higher  the 
number,  the  more  messages.  The  default  level  is  0,  for 
no  debugging  output. 

-J  Join  pairs  of  input  channels  to  form  single  complex 
output  channels.  The  total  number  of  channels  in  the 
input  files  must  be  even,  aind  the  output  file  has  half 
that  number  of  channels.  The  input  channels  are  taken 
in  the  usual  order  and  grouped  in  pairs  to  form  the 
real  and  imaginary  parts  of  the  output  channels.  The 
pairing  is  without  regard  to  whether  two  input  channels 
come  from  the  same  input  file  or  consecutive  files. 

The  last  channel  of  a  file,  if  not  paired  with  the 
previous  input  channel,  is  paired  with  the  first 
channel  of  the  next  input  file. 

The  input  files  must  all  have  the  szune  real  data  type: 
DOUBLE,  FLOAT,  LONG,  SHORT,  or  BYTE.  (See  FEA(5-ESPS) 
for  an  explanation  of  these  type  codes.)  The  output 
file  has  the  corresponding  complex  data  type: 
DOUBLE.CPLX,  FLOAT.CPLX,  LONG.CPLX,  SHORT.CPLX,  or 
BYTE.CPLX. 

If  two  channels  with  with  different  time  alignments  are 
combined  into  one  complex  channel,  time-alignment 
information  may  be  lost.  A  warning  message  is  printed 
in  that  case.  See  the  discussion  of  the  "start_time" 
generic  header  item  in  the  section  on  ESPS  Headers. 

-P  param_file 

The  naune  of  the  parauneter  file.  The  default  naune  is 
‘ 'params ' ' . 

ESPS  PARAMETERS 

The  parauneter  file  is  not  required  to  be  present,  as  there 


93 


are  default  values  for  all  parameters.  If  the  file  exists, 
the  following  pareuneters  may  be  read  if  they  are  not 
determined  by  command-line  options. 

start  -  integer  array 

The  starting  record  number  in  each  input  file.  The 
array  elements  are  matched  with  input  files  in  order. 

If  there  are  more  input  files,  the  last  array  element 
applies  to  the  unmatched  file.  If  there  are  more  array 
elements,  the  unmatched  ones  are  ignored.  This 
parameter  is  not  read  if  the  -p,  -r,  or  -s  option  is 
specified.  The  default  is  all  I's,  meaning  the 
beginning  of  each  input  file. 

nan  -  integer 

The  number  of  records  to  process  in  each  input  file.  A 
value  of  0  (the  default)  means  continue  processing 
until  the  end  of  an  input  file  is  reached.  This 
parameter  is  not  read  if  the  -p,  -r,  or  -s  option  is 
specified. 

make.complex  -  string 

A  value  of  "YES"  or  "yes"  means  join  pairs  of  real 
cheinnels  to  form  complex  chamnels  as  if  the  -J  option 
is  in  force.  A  value  of  "NO"  or  "no"  means  make  a 
separate  output  channel  for  each  input  cheoinel  as 
usual.  No  other  values  are  allowed.  This  parameter  is 
not  read  if  the  -J  option  is  specified.  The  default 
value  is  "NO". 

ESPS  COMMON 

The  ESPS  Common  file  is  not  read. 

If  Common  processing  is  enabled,  and  the  output  file  is  not 
standard  output,  the  program  writes  the  Common  parameters 
prog,  filename,  start,  and  neui  to  record  the  program's  name, 
the  name  of  the  output  file,  the  starting  record  number  of 
the  output  file  (always  1) ,  and  the  number  of  points  in  the 
output  file. 

ESPS  Common  processing  may  be  disabled  by  setting  the 
environment  variable  USE_ESPS_C0MM0N  to  off.  The  default 
ESPS  Common  file  is  espscom  in  the  user's  home  directory. 


This  may  be  overidden  by  setting  the  environment  variable 
ESPSCOM  to  the  desired  path.  User  feedback  of  Common 
processing  is  determined  by  the  environment  variable 
ESPS.VERBQSE,  with  0  causing  no  feedback  and  increasing 
levels  causing  increasingly  detailed  feedback.  If 
ESPS_VERBOSE  is  not  defined,  a  default  value  of  3  is 
assumed. 

ESPS  HEADERS 

The  output  header  is  a  new  FEA.SD  file  header,  with 
appropriate  items  copied  from  the  input  headers. 

The  generic  header  item  ''record_f  req" ,  which  must  have  the 
same  value  in  all  input  files,  is  copied  into  the  output 
header . 

The  generic  header  item  "start.time"  records  the  starting 
time  for  each  output  channel.  It  is  a  single  number  if  all 
output  chainnels  have  the  same  starting  time;  otherwise  it  is 
a  vector  with  one  element  per  channel.  The  starting  time 
for  a  channel  is  its  starting  time  in  the  input  file  plus  an 
offset  in  case  the  data  taJcen  from  the  input  file  do  not 
start  with  the  first  record.  The  offset  is  given  by  (r-l)/f 
where  r  is  the  starting  record  number  in  the  input  file  and 
f  is  the  sampling  frequency  given  by  the  "record_f req" 
header  item.  The  -J  option  ceui  create  complex  chainnels 
whose  real  and  imaginary  parts  have  inconsistent  starting 
times.  When  that  happens,  a  warning  message  is  printed,  auid 
the  starting  time  for  the  real  part  is  recorded  in  the 
"start_time"  header  item. 

If  every  input  file  has  a  "max_value"  header  item,  then  the 
output  file  has  a  "max.value"  header  item  containing  the 
same  information. 

EXAMPLES 

Multiplex  data  from  three  input  files  to  produce  am  output 
file  XXX.  Processing  begins  with  the  sampled  data  in  the 
first  record  in  each  input  file.  The  output  file  has  the 
same  length  as  the  shortest  input  file. 

mux  aaa  bbb  ccc  xxx 

Start  at  time  0.5  in  each  input  file  and  process  0.5  seconds 


of  data  from  each.  (Suppose  the  sampling  frequency  is  8000 
Hz,  and  the  start  times  in  the  three  input  files  are  0.0, 
0.0,  amd  0.5.  Then  the  starting  record  numbers  are  4001, 
4001,  euid  1,  respectively.  The  start  time  in  the  output 
file  is  0.5  for  all  channels.) 

mux  -s0.5:1.0  aaa  bbb  ccc  xxx 

Start  at  time  0.5  in  file  aaa  and  with  the  first  record  in 
the  other  two  input  files.  (With  the  assumptions  of  the 
pi ‘)vious  example,  the  starting  record  numbers  in  the  three 
input  files  are  4001,  1,  aind  1,  respectively.  The  start 
times  in  the  output  file  header  are  0.5  for  data  from  files 
aaa  and  ccc  and  0.0  for  data  from  file  bbb.) 

mux  -sO.5:  -pi:  aaa  bbb  ccc  xxx 

Juxtapose  data  from  an  input  file  with  a  test  signal  and 
pass  the  result  to  «inother  prograun. 

testsd  -  1  mux  aaa  -  -  I  more.processing  - 

If  aaa  has  two  channels  of  real  data,  this  will  convert  it 
to  a  single-channel  file  zzz  of  complex  data. 

mux  -J  aaa  zzz 

If  aaa  and  bbb  are  single-chaimel  files  of  real  data,  this 
will  join  them  into  a  single-channel  file  of  complex  data. 

mux  -J  aaa  bbb  zzz 

Multiplex  a  portion  of  a  file  with  a  later  portion  of  the 
same  file. 

mux  -pl001:2000  -p2501:  aaa  aaa  xxx 

SEE  ALSO 

demux ( 1-ESPS) ,  copysps(l-ESPS) ,  addgen(l-ESPS) ,  FEA_SD(5- 
ESPS),  FEA(5-ESPS) 

DIAGNOSTICS 

The  program  exits  with  an  error  message  if  einy  of  the 
following  occur. 


The  commeind  line  cannot  be  parsed. 

Fewer  than  two  file  names  are  specified  (one  in,  one  out) . 
Fewer  input  file  naones  are  specified  them  -p,  -r,  and  -s 
options . 

More  than  one  input  file  najne  is 

An  input  file  cannot  be  opened  or  is  not  an  ESPS  sampled- 
data  file. 

The  input  files  do  not  all  have  the  same  sampling 
frequency. 

The  input  files  do  not  all  have  the  same  data  type. 

The  -J  option  is  specified  with  input  files  of  a  complex 
data  type. 

The  -J  option  is  specified,  and  the  total  number  of  input 
channels  is  odd. 

A  starting  record  specified  with  a  -p,  -r,  or  -s  option 
does  not  exist  in  all  the  files  that  the  option  applies  to. 

The  program  issues  a  warning  message  if  the  end  of  a  range 
specified  by  a  -p,  -r,  or  -s  option  is  not  reached,  and  the 
option  argument  (see  the  Options  section)  ends  with  an 
explicit  last,  end,  or  +incr.  (This  doesn't  apply  to  option 
arguments  that  default  to  end-of-file  by  omitting  what 
follows  the  colon.)  The  end  of  the  range  may  fail  to  be 
reached  either  because  the  end  of  an  input  file  is  reached 
first  or  because  another  -p,  -r,  or  -s  option  causes  an 
earlier  stop. 

The  progrzim  issues  a  warning  message  if  processing  for  the 
-J  option  joins  two  chzuinels  that  are  not  properly  time- 
aligned  (so  that  they  would  require  conflicting  entries  in 
the  output  "start.tirae"  header  item) . 

BUGS 

The  -s  option  is  not  implemented  in  this  version  of  the 
program. 

AUTHOR 

Manual  page  by  Rodney  Johnson.  Prograun  by  Alan  Parker. 


D.5  S32cplay 


97 


Manual  Pages  for  the  ESPS  Command  "s32cplay" : 

NAME 

s32cplay  -  send  sampled  data  (PCM)  to  an  Ariel  dsp32c  S- 
bus/ProPort  D/A  converter. 

SYNOPSIS 

s32cplay  [  -r  range  ]  [  -s  start  time  ]  [  -e  end  time  ]  C  -f 
sample  rate  ]  [  -c  channel  ]  [  -x  debug-level  ]  [  -H  ]  [[  -i 
]  file  ]  [  more-files  ] 

DESCRIPTION 

S32cplay  sends  all  or  a  portion  of  one  or  more  ESPS,  SIGnal, 
NIST  or  headerless  sampled  data  files  to  a  S-32C/ProPort 
digital-to-analog  converter.  A  subrange  of  data  within  the 
files  may  be  chosen;  this  subrange  may  be  specified  in 
seconds  or  sample  points,  Dual-chamnel  (stereo)  or  single¬ 
channel  (monaural)  data  may  be  converted.  Single-channel 
input  data  may  be  directed  to  either  or  both  output 
channels . 

Playback  may  be  stopped  by  sending  the  terminal's  interrupt 
character  (normally  control-C)  after  playback  has  started. 

If  is  given  for  a  fileneime,  then  the  input  is  taken  from 
standard  input  and  must  be  an  ESPS  file  or  a  headerless  file 
(i.e.,  SIGnal  or  NIST/Sphere  files  cannot  be  used  with 
standard  input) . 

OPTIONS 

The  following  options  are  supported: 

-r  range 

Select  a  subrange  of  points  to  be  played,  using  the 
format  start-end  ,  start: end  or  start : +count .  Either 
the  start  or  the  end  may  be  omitted;  the  beginning  or 
the  end  of  the  file  are  used  if  no  alternative  is 
specified. 

If  multiple  files  were  specified,  the  same  range  from 
each  file  is  played. 


-s  start  time 

Specify  the  start  time  in  seconds.  Play  will  continue 
to  the  end  of  file  or  the  end  time  specified  with  -e. 

-s  may  not  be  used  with  -r. 

-e  end  time 

Specify  the  playback  end  time  in  seconds.  Play  will 
start  at  the  beginning  of  file  or  the  time  specified  by 
-s.  -e  may  not  be  used  with  -r. 

-f  frequency 

Specifies  the  sampling  frequency.  The  closest 
frequency  to  that  requested  will  be  selected  from  those 
available  and  the  user  will  be  notified  if  the  selected 
value  differs  from  that  requested.  If  -f  is  not 
specified,  the  sampling  frequency  in  the  header  is 
used,  else  the  default  value  for  headerless  files  is 
16kHz  (assuming  the  stauidard  Ariel  crystal) . 

-c  channel 

Select  the  output  chaoinel  configuration.  For  files 
with  headers,  the  default  is  to  play  stereo  if  the  file 
is  stereo  and  to  provide  identical  output  on  both 
channels  if  the  file  is  single-channel.  If  the  file 
has  a  header  and  is  single  channel,  acceptable 
arguments  to  -c  are  0  or  1  to  select  chauinel  A  or  B 
respectively  for  output.  If  the  file  has  no  header, 
the  default  is  to  assume  single-channel  data  euid 
provide  identical  output  to  both  cheinnels.  For 
headerless  files,  acceptable  arguments  to  -c  are  0 
(output  on  channel  A) ,  1  (output  on  channel  B)  or  2 
(stereo  data,  stereo  output). 

-H  Force  s32cplay  to  treat  the  input  as  a  headerless  file. 
This  is  probably  unwise  to  use  unless  the  gain  on  your 
loudspeaker  or  earphones  is  way  down,  since  a  file  that 
really  does  have  a  header,  or  a  file  composed  of  data 
types  other  than  shorts  (of  the  correct  byte  order!) 
will  cause  a  terrible  sound. 

-i  input  file 

Specify  a  file  to  be  D/A  converted.  Use  of  -i  before 


the  file  designation  is  optional  if  the  filename  is  the 
last  command-line  component.  If  no  input  file  is 
specified,  or  if  is  specified,  input  is  tauken  from 
stdin. 

-X  debug_level 

Setting  debug.level  nonzero  causes  several  messages  to 
be  printed  as  internal  processing  proceeds.  The 
default  is  level  0,  which  causes  no  debug  output. 

INTERACTION  WITH  XWAVES 

S32cplay  is  designed  to  optionally  use  the  server  mode  of 
xwaves  (1-ESPS)  .  This  is  especially  hzuidy  when  s32cplay  is 
used  as  an  xwaves  external  play  commemd  (e.g.  by  setting  the 
xwaves  global  play_prog) .  When  the  latter  is  the  case,  play 
commands  initiated  via  xwaves'  menu  operations  may  be 
interrupted  by  pressing  the  left  mouse  button  in  the  data 
view.  Xwaves  will  send  a  signal  30  (SIGUSRl)  to  the  play 
program.  S32cplay  responds  to  this  by  sending  back  to 
xwaves  a  command  "set  da.location  xx",  where  xx  is  the 
sample  that  was  being  output  when  play  was  interrupted. 

This  setting,  in  conjunction  with  xwaves'  built-in  callback 
procedure  for  handling  child-process  exits,  causes  the 
xwaves  signal  display  to  center  itself  on  the  sample  where 
play  was  halted. 

The  SIGUSRl  signal  to  terminate  s32cplay  may  come  from  einy 
source.  If  it  comes  from  sources  other  than  xwaves,  the 
environment  variables  WAVES_P0RT  and  WAVES_H0ST  must  be 
correctly  defined  (see  espsenv  (1-ESPS)),  for  correct 
functioning  of  the  xwaves  view  positioning.  (Of  course, 
xwaves  must  actually  be  displaying  the  signal  in  question  at 
the  time  and  xwaves  must  have  initiated  the  play.) 

S32cplay  may  also  be  interrupted  with  kill  -2  (SIGINT)  or 
kill  -3  (SIGQUIT)  .  These  signals  are  caught  gracefully  eind 
s32cplay  halts  immediately,  but  no  message  is  sent  to 
xwaves.  No  message  is  sent  if  the  play  'oeration  finishes 
without  interruption. 

ESPS  PARAMETERS 

The  parameter  file  is  not  read. 


100 


ESPS  COMMON 

ESPS  Common  is  not  read  or  written. 

DIAGNOSTICS 

S32cplay  informs  the  user  if  the  input  file  does  not  exist, 
if  inconsistent  options  are  used,  or  if  an  unsupported 
sample  rate  is  requested.  Also  see  WARNINGS  below. 

If  the  starting  point  requested  is  greater  than  the  last 
point  in  the  file,  then  a  message  is  printed.  If  the  ending 
point  requested  is  greater  than  the  last  point  in  the  file, 
it  is  reset  to  the  last  point  and  processing  continues. 

WARNINGS 

S32cplay  supports  only  the  "native"  Ariel  ProPort  sampling 
rates.  These  are  tabulated  in  the  Ariel  User’s  Mzuiual . 

Note  that  optional  crystals  are  available  from  Ariel  that 
will  provide  a  different  selection  of  frequencies.  Should 
the  crystal (s)  be  changed,  it  will  be  necessary  to  edit 
ESPS_BASE/s32cbin/s32c0.srtable  to  reflect  the  new 
selection.  If  you  play  a  file  that  is  sampled  at  an 
unsupported  rate,  s32cplay  plays  the  data  at  the  closest 
supported  rate  and  issues  a  warning. 

S32cplay  provides  stereo  D/A  conversion  at  rates  up  to  at 
least  48kHz  when  playing  from  local  disk.  Sampling  from 
network  disks  is  often  feasible  as  well.  The  maximum  rate 
over  the  network  is  unpredictable  in  general,  but  we 
routinely  achieve  16kHz  stereo  at  Entropic  Research 
Laboratory.  Of  course  rate  limitations  due  to  network  s^aed 
will  be  less  severe  for  single-channel  playing.  Obviously, 
processes  supplying  input  to  s32cplay  on  a  pipe  must  be  able 
to  keep  up  with  the  average  aggregate  sampling  frequency. 

Note  that  s32cplay  requires  that  the  dsp32c  program  called 
"ar_atod"  be  located  in  ESPS_BASE/s32cbin.  If  it  is  located 
anywhere  else,  the  environment  variable  ARIELS32C_BIN_PATH 
must  be  set  accordingly. 


FILES 

ESPS_BASE/ s32cbin/ar_atod 


JOl 


the  dsp32c  program  that  runs  on  the  Ariel  S-32C  card. 


ESPS_BASE/ s32cbin/s32c0 . srtable 
the  table  of  available  Seimpling  rates  (dependent  on  crystal 
frequencies) . 


BUGS 

If  readable  header  IS  present,  but  -H  is  specified,  the 
header  is  treated  like  sampled  data  —  usually  resulting  in 
very  unpleasant  sounds. 

If  a  s32cplay  is  in  progress  when  emother  S32C  card 
operation  is  requested  on  the  seune  machine,  unpredictable 
results  will  occur. 

There  is  currently  no  reliable  notification  in  the  event  of 
loss  of  realtime. 

EXPECTED  CHANGES 

Implement  a  locking  mechanism  to  prevent  collision  of 
multiple  simultaneous  attempts  to  use  the  Ariel  board. 

Implement  a  robust  check  for  loss  of  real-time  operation. 

SEE  ALSO 

SD  (5-ESPS) ,  testsd  (l-ESPS),  copysd  (1-ESPS) ,  s32crecord 
(1-ESPS) ,  sfconvert  (1-ESPS),  s32csgram  (l-ESPS), 
send_xwaves2  (3-ESPS) 

AUTHORS 

David  Talkin  at  Entropic  Research  Laboratory  and  Michael 
McCaiidless  at  MIT  Laboratory  of  Computer  Science. 


D.6  S3  2c  record 

Manual  Pages  for  the  ESPS  Command  "s32crecord" : 

NAME 

s32crecord  -  mono  or  stereo  record  to  disk  or  pipe  for  Ariel 
dsp32c  S-bus/ProPort  A/D  converter. 


102 


SYNOPSIS 

s32crecord  [  -s  duration  ]  [  -f  sample  rate  ]  C  -c  channel  ] 

[  -S  ]  [  -W  xwaves  display  args.  ]  [  -P  ]  [  -p  prompt 
string  ]  [  -X  debug-level  ]  [  -H  ]  [[  -o  ]  file  ] 

DESCRIPTION 

S32crecord  provides  mono  or  stereo  sampling  at  selected 
rates  up  to  at  least  48kHz  when  recording  to  local  disk. 
Direct  recording  onto  network  disks  is  often  feasible  as 
well.  Output  files  have  ESPS  FEA_SD  headers,  or, 
optionally,  no  headers.  Output  may  optionally  be  directed 
to  stdout .  S32crecord  has  special  adaptations  that  permit 
tight  coupling  with  xwaves  (see  INTERACTION  WITH  XWAVES 
below) . 

When  no  output  file  is  specified,  s32crecord  will,  by 
default,  write  a  FEA_SD  file  to  standard  output.  When 
stereo  recording  is  selected  (see  the  -c  option)  chaoinels  0 
and  1  alternate  in  the  file,  with  channel  0  being  first. 

Note  that  processes  consuming  the  output  of  s32crecord  on  a 
pipe  must  be  able  to  keep  up  with  the  average  aggregate 
sampling  frequency.  Options  are  available  to  control  the 
sampling  rate,  recording  duration,  prompting,  header 
suppression,  and  immediate  display  by  xwaves. 

OPTIONS 

The  following  options  are  supported: 

-s  durationClO] 

Specifies  the  maximum  duration  of  the  recording  session 
in  seconds.  Recording  may  be  interrupted  before  this 
time  expires  with  SIGINT,  SIGQUIT  or  SIGUSRl  (see 
below).  The  default  duration  is  10  seconds.  The  upper 
limit  is  set  only  by  disk  space. 

-f  frequency  [16000] 

Specifies  the  sampling  frequency.  The  closest 
frequency  to  that  requested  will  be  selected  from  those 
available  and  the  user  will  be  notified  if  the  selected 
value  differs  from  that  requested.  If  -f  is  not 
specified,  the  default  Seunpling  frequency  of  16kHz  is 


103 


used  (assuming  the  standard  Ariel  crystal) . 

c  ch2uinel[l] 

By  default  s32crecord  samples  amd  stores  single -channel 
data  from  channel  B.  Alternate-channel  recording  is 
selected  by  setting  chauinel  to  0  for  channel  A,  to  1 
for  channel  B,  or  to  2  for  two-channel  (stereo) 
recording. 

S  Enable  the  xwaves  (l-ESPS)  "make"  command  via 

send_xwaves2  (3-ESPS)  when  the  requested  recording  time 
has  elapsed  or  when  recording  is  interrupted.  This 
permits  immediate  examination  of  the  recorded  passage 
using  xwaves.  See  INTERACTION  WITH  XWAVES  below. 

W  The  argument  to  this  option  will  be  appended  to  the 
send_xwaves  "make"  commeoid  to  permit  display 
customization  (e.g.  via  window  location  and  size 
specifications).  See  INTERACTION  WITH  XWAVES  below. 

P  Enable  a  prompt  message  when  A/D  has  actually 

commenced.  The  default  message  is  a  "bell  ring"  and 
the  text  "Start  recording  now  ...."  This  prompt  may  be 
changed  with  the  -p  option. 

p  prompt  string 

Prompt  string  will  be  used  as  the  alert  that  recording 
is  commencing.  Specifying  -p  forces  -P. 

H  Suppresses  header  creation.  A  "bare"  sample  stresun 
will  result.  The  default  is  to  produce  an  ESPS  FEA_SD 
file  with  one  or  two  cheuinels  depending  on  the  -c 
option. 

X  debug_levci 

Setting  debug_level  nonzero  causes  several  messages  to 
be  printed  as  internal  processing  proceeds.  The 
default  is  level  0,  which  causes  no  debug  output. 

o  output 

Specifies  a  file  fo^  output.  Use  of  -o  before  the  file 
desi^,'  -.ticn  is  optional  if  the  filename  is  the  last 


101 


command -line  component.  If  no  output  file  is  specified 
or  if  is  specified,  output  will  go  to  stdout . 

INTERACTION  WITH  XWAVES 

S32crecord  is  designed  to  optionally  use  the  server  mode  of 
xwaves  (1-ESPS)  for  display  of  its  output  file  on  completion 
of  the  record  operation.  This  is  implemented  using 
send_xwaves2  (3u-ESPS) .  The  following  conditions  must  be 
met  for  this  feature  to  work.  (1)  Xwaves  must  be  running  in 
the  server  mode.  (2)  The  environment  variables  WAVES_PQRT 
and  WAVES_HOST  must  be  correctly  defined  (see  espsenv  (l- 
ESPS)).  (3)  The  record  operation  must  be  interrupted  with  a 
SIGUSRl  signal  (e.g.  via  "kill  -30  pid,"  where  pid  is  the 
process  ID  of  the  s32crecord  process) ,  or  if  s32crecord  is 
not  thus  interrupted,  the  -S  flag  must  have  been  set.  (4) 
Output  must  be  to  a  file. 

An  example  mbuttons  (1-ESPS)  script  to  implement  a  primitive 
record  control  painel  follows: 

"RECORD  ChanO"  exec  s32crecord  -cO  -P  -s60  -S  -W'name  $$ 
loc_y  150"  \  xx$$ft  echo  $!  >  foo 
"RECORD  Chanl"  exec  s32crecord  -cl  -P  -s60  -S  -W'name  $$ 
loc_y  150"  \  xx$$&  echo  $!  >  foo 
"RECORD  Stereo"  exec  s32crecord  -P  -s60  -S  -W'name  $$ 
loc_y  150"  \  xx$$&  echo  $!  >  foo 
"STOP"  kill  -30  'cat  foo' 

"ERASE"  f='cat  foo'  ;  k='echo  $f  1  -  p  q  |  dc'  ;  \ 

kill  -2  $f  ;  rm  -f  xx$k  ;  send_xwaves  kill  name  $k 

Note  how  the  -W  option  is  used  to  naune  the  display  ensemble 
and  to  fix  the  vertical  location  of  the  waveform  at  the  saune 
place  on  consecutive  invocations.  In  general,  the  -W  option 
caoi  be  used  to  augment  the  display  generation  as  described 
under  the  "make"  command  in  the  xwaves  mauiual.  Note  that 
the  "STOP"  function  is  implemented  with  a  "kill  -30" 
(SIGUSRl) .  This  causes  s32crecord  to  send  the  "maJce" 
command  to  xwaves.  If  either  kill  -2  (SIGINT)  or  kill  -3 
(SIGQUIT)  is  sent  to  s32crecord,  it  will  terminate 
gracefully,  but  will  not  send  any  messages  to  xwaves.  The 
-S  option  causes  the  xwaves  display  operation  to  occur  even 
in  the  non- interrupted  case  (i.e.  after  60  sec  of 
recording).  The  above  script  is  not  robust,  but  may  serve 


105 


as  a  useful  starting  point  for  more  serious  attempts. 

ESPS  PARAMETERS 

The  parauneter  file  is  not  read. 

ESPS  COMMON 

ESPS  Common  is  not  read  or  written. 

WARNINGS 

Note  that  s32crecord  requires  that  the  dsp32c  program  called 
"ar_atod"  be  located  in  ESPS_BASE/s32cbin.  If  it  is  located 
anywhere  else,  the  environment  variable  ARIELS32C_BIN_PATH 
must  be  set  accordingly. 

When  output  is  to  a  file,  the  ESPS  header,  if  it  is  present, 
will  correctly  reflect  the  absolute  maximum  sample  value 
encountered  during  recording  2ind  the  number  of  samples 
recorded.  If  output  is  to  a  pipe,  these  values  are  not 
recorded  in  the  header. 

If  the  output  of  s32crecord  goes  to  the  terminal,  bad  things 
will  happen  to  the  terminal  configuration  and  you  may  not  be 
able  to  regain  control  of  the  terminal.  In  this  case,  kill 
the  window  or  kill  the  record  process  remotely  from  another 
window. 

S32crecord  supports  only  the  "native"  Ariel  ProPort  sampling 
rates.  These  are  tabulated  in  the  Ariel  User's  Manual. 
Note  that  optional  crystals  are  available  from  Ariel  that 
will  provide  a  different  selection  of  frequencies.  Should 
the  crystal(s)  be  changed,  it  will  be  necessary  to  edit 
ESPS_BASE/s32cbin/s32c0.srtable  to  reflect  the  new 
selection.  If  an  unsupported  rate  is  requested,  s32crecord 
records  at  the  closest  supported  rate  and  issues  a  warning. 

The  maucimum  rate  over  the  network  is  unpredictable  in 
general,  but  we  routinely  achieve  16kHz  stereo  at  Entropic 
Research  Laboratory.  Of  course  rate  limitations  due  to 
network  speed  will  be  less  severe  for  single-channel 
recording. 


FILES 


106 


ESPS_BASE/s32cbin/ar_atod 

the  dsp32c  program  that  runs  on  the  Ariel  S-32C  card. 

ESPS_BASE/s32cbin/s32cO . srtable 
the  table  of  available  sampling  rates  (dependent  on  crystal 
frequencies) . 


BUGS 

If  a  s32cplay  or  another  s32crecord  operation  is  started 
while  one  is  in  progress,  unpredictable  results  will  occur. 

There  is  currently  no  reliable  notification  in  the  event  of 
loss  of  realtime. 

EXPECTED  CHANGES 

Implement  a  locking  mechauiism  to  prevent  collision  of 
multiple  simultaneous  attempts  to  use  the  Ariel  board. 

Implement  a  robust  check  for  loss  of  real-time  operation. 

SEE  ALSO 

SD  (5-ESPS),  testsd  (1-ESPS) ,  copysd  (1-ESPS),  s32cplay 
(1-ESPS),  sfconvert  (1-ESPS),  s32csgram  (1-ESPS) 

AUTHORS 

David  Talkin  at  Entropic  Research  Laboratory  amd  Michael 
McCandless  at  MIT  Laboratory  of  Computer  Science. 


107 


Appendix  E.  Plots  of  Sample  HRTF  Filters 


The  following  filters  are  examples  of  those  used  in  each  simulation  completed 
in  this  thesis.  'Fhe  angles  are  given  hy  th<‘ oriouitation  shown  in  figure  1  and  figure  2. 


108 


109 


110 


112 


Bthliugmphy 


1.  Abbagnaro,  Louis  A.,  et  al.  “Measurements  of  DifFraction  and  liit('raural  Delay 
of  a  Progressive  Sound  Wave  C’aused  by  the  Human  Head,"  Journal  of  tht 
Acoustical  Society  of  America,  .5i'f:6y3  700  (September  1975). 

2.  Allen,  Jont  B.  and  David  A.  Berkley.  “Image  Method  for  Efficiently  Simulating 
Small-Room  Acoustics.”  Journal  of  the  Acoustical  Society  of  America,  65:943- 
950  (April  1979). 

3.  Blauert,  Jens.  Spatial  Heariny.  MU'  Press,  1983. 

4.  Blauert,  Jens.  “An  Introduction  to  Binaural  Technology.”  Confennce  on  Bin¬ 
aural  and  Spatial  Heariny.  1993. 

5.  Borish,  Jeffrey.  “Extension  of  the  Image  Model  to  Arbitrary  Polyhedra,"  Jour¬ 
nal  of  the  Acoustical  Society  of  America,  85:787-796  (February  1984). 

6.  Clifton,  Rachel.  “Context  and  Learning  Affect  Echo  Perception.”  Conference 
on  Binaural  and  Spatial  Hearing.  1993. 

7.  Colorr.bi,  John.  “Dichotic  Listening,”  Unpublished  (1993). 

8.  Durlach,  Nathaniel  and  Barbara  Shinn-Cunningham.  “Auditory  Displays  and 
Localization.”  Conference  on  Binaural  and  Spatial  Heari^'j.  i  .j93. 

9.  Ericson,  Marc  (Electronics  Engineer.  AAMRfy).  "Personal  Interview.”  Unpub¬ 
lished.  August  1993. 

10.  Genuit,  Klaus  and  Mahlon  Burkhard.  “Artificial  Head  Measurement  Systems 
for  Subjective  Evaluation  of  Sound  Quality,”  Journal  of  Sound  and  Vibration, 
18-23  (March  1992). 

11.  Gierlich,  H.W.  “The  Application  of  Binaural  Technology,”  Applied  Acoustics, 
.86:219-243  (March  1992). 

12.  Hafter,  Ervin.  “Binaural  Precedence  and  Other  Onset  Effects.”  Conference  on 
Binaural  and  Spatial  Hearing.  1993. 

13.  Hartmann,  William  Morris.  “Listening  in  a  Room  and  the  Precedence  Effect.” 
Conference  ori  Binaural  and  Spatial  Hearing.  1993. 

14.  Knudsen,  Vern  0.  and  Cyril  M.  Harris.  .Acoustical  Designing  in  Architecture . 
American  Institute  of  Physics,  1978. 

15.  Konishi,  Masakazu.  “Listening  With  Two  Ears,”  Scientific  .American.  66-73 
(April  1993). 

16.  Krokstad,  A.,  et  al.  “Calculating  the  Acoustical  Room  Response  by  the  use 
of  the  Ray  Tracing  Technicpie,”  Journal  of  Sound  and  Vibration,  8:118-125 
(1968). 


17.  Kuhn,  George  F.  ‘'Model  for  the  Interaural  I'itne  Differc'iices  in  the  Azimuthal 
Plane,’'  Journal  of  Iht  Acoustical  Society  of  America,  6^:107-167  (July  1977). 

18.  Lehnert,  HUu\ar  and  Jens  Blauert.  ‘‘Principles  of  Binaural  Boom  Simulation,” 
Applied  Acoustics,  .^t):259-‘291  (March  199‘2). 

19.  McKinley.  Richard  L.  Concept  and  Design  of  an  Auditory  Localization  Cut 
Synthesizer.  MS  thesis.  Air  Force  Institute  of  Technology  (AU),  1988. 

‘20.  McKinley.  Richard  L.  ‘‘Flight  Demonstration  of  a  J-l)  Audio  Display.”  Confer¬ 
ence  on  Binaural  and  Spatial  Hearing.  199J. 

21.  Middlebrooks,  John.  ‘‘.4  Computational  Model  of  Narrowband  Sound  Localiza¬ 
tion.”  Conference  on  Binaural  and  Spatial  Hearing.  1993. 

22.  Middlebrooks,  John  C.  “Directional  Sensitivity  of  Sound-Pressure  Levels  in  the 
Human  Ear  Canal,”  Journal  of  the  .Acoustical  Society  of  America,  i^6;89  -108 
(July  1989). 

23.  Moller,  Henrik.  “F'undamentals  of  Binaural  Technology,”  .Applied  Acoustics, 
5^:171-218  (March  1992). 

24.  Morse,  Philip  M.  Vibration  and  Sound,  McGraw-Hill  Book  Company,  Inc., 
1948. 

25.  Morse,  Philip  M.  and  K.  Uno  Ingard.  Theoretical  .Acoustics.  McGraw-Hill  Book 
Company,  Inc.,  1968. 

26.  Ondet,  A.  M.  and  J.  L.  Barby.  “Modeling  of  Sound  Propagation  in  Fitted 
Workshops  Llsing  Ray-tracing,”  Journal  of  the  Acoustical  Society  of  America, 
55:787-796  (February  1989). 

27.  Pierce,  Allan  D.  .Acoustics:  A71  Introduction  to  Its  Physical  Principles  and 
Applications.  McGraw-Hill  Book  Company,  Inc.,  1981. 

28.  Rayleigh,  John  William  Strutt  (Third  Baron).  Theory  of  Sound,  I .  Dover 
Publications,  1907. 

29.  Rayleigh,  John  William  Strutt  (Third  Baron).  Theory  0/  Sound,  II.  Dov(>r 
Publications,  1907. 

30.  Richarson,  E.G.  Sound.  Edward  Arnold  and  Co..  1927. 

31.  Sayers,  Bruce  and  Colin  E.  Cherry.  “Mechanism  of  Binaural  Fusion  in  the 
Hearing  of  Speech,”  Journal  of  the  .Acoustical  Society  of  .America,  (1976). 

32.  Scarborough,  Captain  Eric  L.  Enhance  ment  of  Audio  Loceilization  Cue  Synthe¬ 
sis  by  Adding  Environmenteil  and  Visual  Cues.  MS  thesis.  Air  Force  Institute 
of  Technology  (AU),  1992. 

33.  Shaw,  Edgar  .4.G.  “Acoustical  Characteristics  of  the  Human  External  Ear.” 
Conference  on  Binaural  and  .Spatial  Hearing.  1993. 


114 


34.  Stephens.  R.  VV.  B.  and  A.  F,.  Bate.  Aroustics  and  Vtbmlionul  Physic^^.  I'Idward 
Arnold  Publishers,  Ltd.,  19.50. 

35.  Stewart,  (leorge  Walter.  Inlroduclory  Aroustus.  1).  Van  Nostrand  Company. 
Inc.,  1933. 

36.  Sytronics,  Inc.  Virtual  Acoustivs  for  Pilot  Displuy.'i.  'lechnical  Report.  Sytron- 
ics,  Inc.,  September  1992. 

37.  Vorlander,  Michael.  '‘Simnlation  of  the  Transient  and  Steady-State  Sound 
Propagation  in  Rooms  Using  a  New  Combined  Ray-tracing/Image-source  .Algo¬ 
rithm,”  Journal  of  the  Acoustical  Society  of  America,  <y6’;172  178  (Jnl\’  1989). 

38.  Wenzel,  Elizabeth.  "‘Research  on  Virtual  .Acoustic  Environments  at  N.ASA." 
Conference  on  Binaural  and  Spatial  Hearing.  1993. 

39.  Wenzel,  Elizabeth  M.  "‘Localization  in  Virtual  Acoustic  Displays,”  Presence,  I 
(1992). 

40.  Wightman,  Frederic  L.  and  Doris  .1.  Kistler.  “Headphone  Simnlation  of  Free- 
Field  Listening. ILPsychophysical  Validation,”  Journeil  of  the  Acou.Htical  Socie  ty 
of  America,  85  (1989). 

41.  Wightman,  Frederic  L.  and  Doris  J.  Kistler.  “Headphone  Simulation  of  Free- 
Field  Listening. LStimulus  Synthesis,”  Journal  of  the  .{coustical  Society  eyf 
America,  85  (1989). 

42.  Wightman,  Frederic  L.  and  Doris  J.  Kistler.  “Factors  Affecting  Relative  Impor¬ 
tance  of  Sound  Localization  Cues.”  Conference  on  Binaurul  and  Spatial  Hearing. 
1993. 

43.  Winstanley,  .I.W.  Textbook  on  Sound.  Longmans,  Green  and  Co.,  1952. 


Vita 

Brian  A.  Smith  was  born  on  15  November  1967  in  Beech  Crove,  Indiana.  In 
1986,  he  graduated  from  Greenfield-( Central  High  School  and  accepted  an  appoint¬ 
ment  to  the  United  States  Air  Force  Academy.  In  1990,  he  was  commissioned  out 
of  the  Academy  and  received  a  Bachelor  of  Science  in  Uhemistry.  .After  graduation, 
Lt.  Smith  attended  Undergraduate  Pilot  Training  (UPT)  at  Heese  AFB,  TX.  He 
graduated  from  UPT  in  November  of  1991  aiul  received  a  ’"banked"  assignment  to 
Wright-Patterson  AFB.  Stationed  at  VVP.AFB  to  earn  a  masters  degree,  Lt.  Smith 
also  took  part  in  several  experiments  at  .Armstrong  Laboratories.  He  was  a  test 
subject  in  high  G  experiments  as  well  as  for  motion  sickness  experiments.  Following 
the  completion  of  the  AFIT  masters  program,  Lt.  Smith  returned  to  flying  status. 


Permanent  address:  1901  Roop  Ct 

Kettering,  OH  45420 


116 


REPORT  DOCUMENTATION  PAGE 

form  Approved 

0MB  No.  0704-0188 

PubUc  reporting  burden  for  this  collection  of  information  ts  estimated  to  average  i  hour  per  response,  including  the  time  for  reviewir>g  instructions,  searching  existing  d*u  sosirces. 
gathering  and  maintaining  the  data  needed,  and  completing  and  revieenng  the  coiieaion  of  information  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  'educing  this  burden,  to  Washington  Headquarters  Services.  Directorate  for  information  Operatiom  aisd  Reports.  U1S  Jefferson 
Oavis  Highway.  Suite  1204.  Arlington.  VA  22202-4302.  and  to  the  Office  of  Management  ar>d  ftudger.  Paperwork  Reduction  Project  (OJ04-0188),  Washirtgton.  DC  20S03. 

1.  AGENCY  USE  ONLY  (Leave  blank)  2.  REPORT  DATE  3.  REPORT  TYPE  ANO  OATES  COVERED 

December  1993  Master’s  Thesis 

4.  TITLE  ANO  SUBTITLE 

BINAURAL  ROOM  SIMULATION 

5.  FUNDING  NUMBERS 

6.  AUTHOR(S) 

Brian  A.  Smith 

7.  PERFORMING  ORGANIZATION  NAME(S)  ANO  AOORESS(ES) 

Air  Force  Institute  of  Technology,  WPAFB  OH  45433-6583 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

AFIT/G  AM/ENC/93D- 1 

9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  AOORESS(ES) 

AL/CFBA 

Attn:  Mr.  Mark  A.  Ericson 

2610  Seventh  St. 

Bldg.  441 

WPAFR.  OH  dfij-Tt-TOni _ 

10.  SPONSORING /MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

Distribution  Unlimited 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (Maximum  200  words) 

Abstract 

Research  in  binaural  and  spatial  hearing  is  of  particulcu*  interest  to  the  Air  Force.  Applications  in  cockpit 
communication,  tsirget  recognition,  and  aircraft  navigation  are  being  explored.  This  thesis  examines  human 
auditory  localization  cues  and  develops  a  mathematical  model  for  the  transfer  function  of  a  sound  signal  traveling 
from  an  isotropic  point  source  through  a  rectangular  room  to  both  ears  of  a  listener.  Using  this  model  as  a  guide, 
non-head  coupled  binaural  sound  signals  are  generated  in  a  binaural  room  simulation.  Reflection  and  attenuation 
cues  includ-^i  i!>  the  computer  generated  signals  are  varied  in  order  to  determine  which  cues  enhsince  the  listener’s 
degree  of  ex^racranialization.  Results  of  this  research  indicate  that  the  addition  of  three  or  more  attenuated 
reflections  into  a  non-head  coupled  binaural  signal  provide  the  listener  with  a  binaural  sound  that  is  localized 
extracranially. 

14.  SUBJECT  TERMS 

Binaural  Sound,  Binaural  Room  Simulation,  Auditory  Localization  Cues,  Reflec¬ 
tions,  Attenuation,  Extracranialized  Sound,  Head  Coupling 

15.  NUMBER  OF  PAGES 

127 

16.  PRICE  CODE 

17.  SECURITY  CLASSIFICATION 
OF  REPORT 

UNCLASSIFIED 

18.  SECURITY  CLASSIFICATION 

OF  THIS  PAGE 

UNCLASSIFIED 

19.  SECURITY  CLASSIFICATION 

OF  ABSTRACT 

UNCLASSIFIED 

20.  LIMITATION  OF  ABSTRACT 

UL 

NSN  7540-01 '280-5S00 


Standard  Form  298  (Rev.  2-89) 

PreuriM  by  ANSI  Std  Z39-I8 
298-102 


